Age | Commit message (Collapse) | Author | Files | Lines |
|
Since they will be used for both KFD and KGD user queues,
rename them from kfd to userq. No intended functional
change.
Acked-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Unmap user queues on suspend and map them on resume.
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Switch from a bool to an enum and allow more options
for enforce isolation. There are now 3 modes of operation:
- Disabled (0)
- Enabled (serialization and cleaner shader) (1)
- Enabled in legacy mode (no serialization or cleaner shader) (2)
This provides better flexibility for more use cases.
Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Checking hive is more readable.
The following smatch warning:
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:6820 amdgpu_pci_slot_reset()
warn: iterator used outside loop: 'tmp_adev'
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Ce Sun <cesun102@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Kernel doorbell BOs needs to be freed before ttm_fini.
Fixes: 54c30d2a8def ("drm/amdgpu: create kernel doorbell pages")
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
disable ASPM with some ASICs on some specific platforms.
required from PCIe controller owner.
Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
So we can iterate across them when we need to manage
all user queues.
v2: add uq_mgr to adev list in amdgpu_userq_mgr_init
Reviewed-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Remove MES self test as this conflicts the userqueue fence
interrupts.
v2:(Christian)
- remove the amdgpu_mes_self_test() function and any now unused code.
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add support to handle the userqueue protected fence signal hardware
interrupt.
Create a xarray which maps the doorbell index to the fence driver address.
This would help to retrieve the fence driver information when an userq fence
interrupt is triggered. Firmware sends the doorbell offset value and
this info is compared with the queue's mqd doorbell offset value.
If they are same, we process the userq fence interrupt.
v1:(Christian):
- use xa_load to extract the fence driver.
- move the amdgpu_userq_fence_driver_process call within the xa_lock
as there is a chance that fence_drv might be freed.
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
ASPM doesn't need to be disabled if pcie dpm is disabled.
So ASPM can be independantly enabled.
Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This is normally handled in the gfx IP suspend callbacks, but
for S0ix, those are skipped because we don't want to touch
gfx. So handle it in device suspend.
Fixes: b9467983b774 ("drm/amdgpu: add dynamic workload profile switching for gfx10")
Fixes: 963537ca2325 ("drm/amdgpu: add dynamic workload profile switching for gfx11")
Fixes: 5f95a1549555 ("drm/amdgpu: add dynamic workload profile switching for gfx12")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Remove workaround code for the early engineering
samples GC v9.4.3 SOCs with revID 0
Reviewed-by: Amber Lin <Amber.Lin@amd.com>
Signed-off-by: Apurv Mishra <Apurv.Mishra@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add a new parameter to disable BAR resizing. Note that this
only disables the driver from attempting to resize the BAR,
The BIOS may have resized the BAR at boot.
Some teams have found this useful in debugging P2P DMA
issues on systems where the available MMIO space did not allow
for all of the GPUs present to resize their BARs.
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add support for DPC recover based on refactored code
Signed-off-by: Ce Sun <cesun102@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Split amdgpu_device_gpu_recover into the following stages:
halt activities,asic reset,schedule resume and amdgpu resume.
The reason is that the subsequent addition of dpc recover
code will have a high similarity with gpu reset
Signed-off-by: Ce Sun <cesun102@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Note when we switch from one isolation owner to another.
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Limiting the number of available VMIDs to enforce isolation causes some
issues with gang submit and applying certain HW workarounds which
require multiple VMIDs to work correctly.
So instead start to track all submissions to the relevant engines in a
per partition data structure and use the dma_fences of the submissions
to enforce isolation similar to what a VMID limit does.
v2: use ~0l for jobs without isolation to distinct it from kernel
submissions which uses NULL for the owner. Add some warning when we
are OOM.
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This enables ras to be resumed after gpu recovery on mi350 sriov.
Signed-off-by: Ellen Pan <yunru.pan@amd.com>
Reviewed-by: Ahmad Rehman <Ahmad.Rehman@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
VFs cannot read the NAK_COUNTER register. This information is only
available through PMFW metrics.
Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
We need to make sure the workload profile ref counts are
balanced. This isn't currently the case because we can
increment the count on submissions, but the decrement may
be delayed as work comes in. Track when we enable the
workload profile so the references are balanced.
v2: switch to a mutex and active flag
v3: fix mutex init
Fixes: 1443dd3c67f6 ("drm/amd/pm: fix and simplify workload handling")
Cc: Yang Wang <kevinyang.wang@amd.com>
Cc: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
We need to make sure the workload profile ref counts are
balanced. This isn't currently the case because we can
increment the count on submissions, but the decrement may
be delayed as work comes in. Track when we enable the
workload profile so the references are balanced.
v2: switch to a mutex and active flag
v3: fix mutex init
Fixes: 8fdb3958e396 ("drm/amdgpu/gfx: add ring helpers for setting workload profile")
Cc: Yang Wang <kevinyang.wang@amd.com>
Cc: Kenneth Feng <kenneth.feng@amd.com>
Tested-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
We keep the gang submission fence around in adev, make sure that it
stays alive.
v2: fix memory leak on retry
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Free on driver cleanup.
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Flora Cui <flora.cui@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
PCI_CLASS_ACCELERATOR_PROCESSING devices won't ever be
the sysfb, so there is no need to free conflicting
apertures.
Reviewed-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Use IP version specific xgmi speed/width for bandwidth calculation.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
https://gitlab.freedesktop.org/agd5f/linux into drm-next
amdgpu:
- Fix spelling typos
- RAS updates
- VCN 5.0.1 updates
- SubVP fixes
- DCN 4.0.1 fixes
- MSO DPCD fixes
- DIO encoder refactor
- PCON fixes
- Misc cleanups
- DMCUB fixes
- USB4 DP fixes
- DM cleanups
- Backlight cleanups and fixes
- Support platform backlight curves
- Misc code cleanups
- SMU 14 fixes
- JPEG 4.0.3 reset updates
- SR-IOV fixes
- SVM fixes
- GC 12 DCC fixes
- DC DCE 6.x fix
- Hiberation fix
amdkfd:
- Fix possible NULL pointer in queue validation
- Remove unnecessary CP domain validation
- SDMA queue reset support
- Add per process flags
radeon:
- Fix spelling typos
- RS400 hyperZ fix
UAPI:
- Add KFD per process flags for setting precision
Proposed user space: https://github.com/ROCm/ROCR-Runtime/commit/2a64fa5e06e80e0af36df4ce0c76ae52eeec0a9d
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250307211051.1880472-1-alexander.deucher@amd.com
|
|
Add support for CPERs on VFs.
VFs do not receive PMFW messages directly; as such, they need to
query them from the host. To avoid hitting host event guard,
CPER queries need to be rate limited. CPER queries share the same
RAS telemetry buffer as error count query, so a mutex protecting
the shared buffer was added as well.
For readability, the amdgpu_detect_virtualization was refactored
into multiple individual functions.
Signed-off-by: Tony Yi <Tony.Yi@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-6.15-2025-02-21:
amdgpu:
- Add OEM i2c support for RGB lights, etc.
- Add support for GC 11.5.3
- Add support for GC 11.5.2
- Add support for SDMA 6.1.3
- Add support for NBIO 7.11.2
- Add support for NBIO 7.9.1
- Add support for MMHUB 3.3.2
- Add support for MMHUB 1.8.1
- Add support for SMU 14.0.5
- Add support for SMUIO 13.0.11
- Add support for PSP 14.0.5
- Add support for UMC 12.5.0
- Add support for DCN 3.6.0
- JPEG 4.0.3 updates
- Add dynamic workload profile switching for GC 10-12
- support larger vbios sizes
- GC 9.5.0 updates
- SMU 13.0.12 updates
- SMU 13.0.6 updates
- IP discovery updates
- GC 10 queue reset updates
- DCN 4.0.1 updates
- UHBR link rate fixes
- Aborted suspend fix
- Mark gttsize parameter as deprecated
- GC 10 cleaner shader updates
- PSR-SU fixes
- Clean up PM4 headers
- Cursor fixes
- Enable devcoredump for JPEG
- Misc cleanups
- Runpm cleanups
- MES updates
- GC 9 gfxoff fixes
- Vbios fetching cleanups
- Documentation updates
- Update secondary plane handling
- DML2 updates
- SDMA fixes for MI
- Cleaner shader fixes for GC 11/12
- ACA updates
- Initial JPEG queue reset support
- RAS updates
- Initial RAS CPER support
- DCN/DCE panic screen handling cleanup
- BT2020 fixes
- SR-IOV fixes
amdkfd:
- synchronize pasid values between KGD and KFD
- Misc cleanups
- Improve GTT/VRAM handling for APUs
- Topology updates
- Fix user queue validation on GC 7/8
UAPI:
- Enable "Broadcast RGB" drm property
- Add INFO IOCTL query for virtualization mode
Proposed userspace:
https://github.com/ROCm/amdsmi/commit/e663bed7d6b3df79f5959e73981749b1f22ec698
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250221213651.4176031-1-alexander.deucher@amd.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
There was a quirk added to add a workaround for a Sapphire
RX 5600 XT Pulse that didn't allow BAR resizing. However,
the quirk caused a regression with runtime pm on Dell laptops
using those chips, rather than narrowing the scope of the
resizing quirk, add a quirk to prevent amdgpu from resizing
the BAR on those Dell platforms unless runtime pm is disabled.
v2: update commit message, add runpm check
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1707
Fixes: 907830b0fc9e ("PCI: Add a REBAR size quirk for Sapphire RX 5600 XT Pulse")
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Update the *handle to amdgpu_ip_block ptr for all
functions pointers of get_clockgating_state.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Move code about checking aca enabled to the cper init/fini function
to make code clean.
Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Priority Inversion in SRIOV
RLCG Register Access is a way for virtual functions to safely access GPU
registers in a virtualized environment., including TLB flushes and
register reads. When multiple threads or VFs try to access the same
registers simultaneously, it can lead to race conditions. By using the
RLCG interface, the driver can serialize access to the registers. This
means that only one thread can access the registers at a time,
preventing conflicts and ensuring that operations are performed
correctly. Additionally, when a low-priority task holds a mutex that a
high-priority task needs, ie., If a thread holding a spinlock tries to
acquire a mutex, it can lead to priority inversion. register access in
amdgpu_virt_rlcg_reg_rw especially in a fast code path is critical.
The call stack shows that the function amdgpu_virt_rlcg_reg_rw is being
called, which attempts to acquire the mutex. This function is invoked
from amdgpu_sriov_wreg, which in turn is called from
gmc_v11_0_flush_gpu_tlb.
The [ BUG: Invalid wait context ] indicates that a thread is trying to
acquire a mutex while it is in a context that does not allow it to sleep
(like holding a spinlock).
Fixes the below:
[ 253.013423] =============================
[ 253.013434] [ BUG: Invalid wait context ]
[ 253.013446] 6.12.0-amdstaging-drm-next-lol-050225 #14 Tainted: G U OE
[ 253.013464] -----------------------------
[ 253.013475] kworker/0:1/10 is trying to lock:
[ 253.013487] ffff9f30542e3cf8 (&adev->virt.rlcg_reg_lock){+.+.}-{3:3}, at: amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[ 253.013815] other info that might help us debug this:
[ 253.013827] context-{4:4}
[ 253.013835] 3 locks held by kworker/0:1/10:
[ 253.013847] #0: ffff9f3040050f58 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x3f5/0x680
[ 253.013877] #1: ffffb789c008be40 ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: process_one_work+0x1d6/0x680
[ 253.013905] #2: ffff9f3054281838 (&adev->gmc.invalidate_lock){+.+.}-{2:2}, at: gmc_v11_0_flush_gpu_tlb+0x198/0x4f0 [amdgpu]
[ 253.014154] stack backtrace:
[ 253.014164] CPU: 0 UID: 0 PID: 10 Comm: kworker/0:1 Tainted: G U OE 6.12.0-amdstaging-drm-next-lol-050225 #14
[ 253.014189] Tainted: [U]=USER, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[ 253.014203] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 11/18/2024
[ 253.014224] Workqueue: events work_for_cpu_fn
[ 253.014241] Call Trace:
[ 253.014250] <TASK>
[ 253.014260] dump_stack_lvl+0x9b/0xf0
[ 253.014275] dump_stack+0x10/0x20
[ 253.014287] __lock_acquire+0xa47/0x2810
[ 253.014303] ? srso_alias_return_thunk+0x5/0xfbef5
[ 253.014321] lock_acquire+0xd1/0x300
[ 253.014333] ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[ 253.014562] ? __lock_acquire+0xa6b/0x2810
[ 253.014578] __mutex_lock+0x85/0xe20
[ 253.014591] ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[ 253.014782] ? sched_clock_noinstr+0x9/0x10
[ 253.014795] ? srso_alias_return_thunk+0x5/0xfbef5
[ 253.014808] ? local_clock_noinstr+0xe/0xc0
[ 253.014822] ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[ 253.015012] ? srso_alias_return_thunk+0x5/0xfbef5
[ 253.015029] mutex_lock_nested+0x1b/0x30
[ 253.015044] ? mutex_lock_nested+0x1b/0x30
[ 253.015057] amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[ 253.015249] amdgpu_sriov_wreg+0xc5/0xd0 [amdgpu]
[ 253.015435] gmc_v11_0_flush_gpu_tlb+0x44b/0x4f0 [amdgpu]
[ 253.015667] gfx_v11_0_hw_init+0x499/0x29c0 [amdgpu]
[ 253.015901] ? __pfx_smu_v13_0_update_pcie_parameters+0x10/0x10 [amdgpu]
[ 253.016159] ? srso_alias_return_thunk+0x5/0xfbef5
[ 253.016173] ? smu_hw_init+0x18d/0x300 [amdgpu]
[ 253.016403] amdgpu_device_init+0x29ad/0x36a0 [amdgpu]
[ 253.016614] amdgpu_driver_load_kms+0x1a/0xc0 [amdgpu]
[ 253.017057] amdgpu_pci_probe+0x1c2/0x660 [amdgpu]
[ 253.017493] local_pci_probe+0x4b/0xb0
[ 253.017746] work_for_cpu_fn+0x1a/0x30
[ 253.017995] process_one_work+0x21e/0x680
[ 253.018248] worker_thread+0x190/0x330
[ 253.018500] ? __pfx_worker_thread+0x10/0x10
[ 253.018746] kthread+0xe7/0x120
[ 253.018988] ? __pfx_kthread+0x10/0x10
[ 253.019231] ret_from_fork+0x3c/0x60
[ 253.019468] ? __pfx_kthread+0x10/0x10
[ 253.019701] ret_from_fork_asm+0x1a/0x30
[ 253.019939] </TASK>
v2: s/spin_trylock/spin_lock_irqsave to be safe (Christian).
Fixes: e864180ee49b ("drm/amdgpu: Add lock around VF RLCG interface")
Cc: lin cao <lin.cao@amd.com>
Cc: Jingwen Chen <Jingwen.Chen2@amd.com>
Cc: Victor Skvortsov <victor.skvortsov@amd.com>
Cc: Zhigang Luo <zhigang.luo@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
And initialize it, this is a pure software ring to store RAS CPER data.
v2: change ring size to 0x100000
v2: update the initialization of count_dw of cper ring, it's dword
variable
v3: skip VM inv eng for cper
v3: init/fini when aca enabled
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Introduce utility functions designed to assist
in populating CPER records.
v2: call cper_init/fini in device_ip_init/fini.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Use DRM's device wedged event to notify userspace that a reset had
happened. For now, only use `none` method meant for telemetry
capture.
In the future we might want to report a recovery method if the reset didn't
succeed.
Acked-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250204070528.1919158-6-raag.jadav@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Keep VBIOS image read optional for select SOCs in passthrough mode.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Certain SOCs may not need much data from VBIOS. Some data like VBIOS
version used will be missed but it doesn't affect functionality. Add a
flag to make VBIOS image optional.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Instead of read_bios, use get_bios_flags to get various options around
reading VBIOS.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Use bios_release wrapper to release memory allocated for vbios image and
reset the variables.
v2:
Use the same wrapper for clean up in sw_fini (Alex Deucher)
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
No functional change. Rework the code to allow for
adding some additional i2c buses in conjunction with DC
in the future.
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
drm_sched_init() has a great many parameters and upcoming new
functionality for the scheduler might add even more. Generally, the
great number of parameters reduces readability and has already caused
one missnaming, addressed in:
commit 6f1cacf4eba7 ("drm/nouveau: Improve variable name in
nouveau_sched_init()").
Introduce a new struct for the scheduler init parameters and port all
users.
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Acked-by: Matthew Brost <matthew.brost@intel.com> # for Xe
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> # for Panfrost and Panthor
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com> # for Etnaviv
Reviewed-by: Frank Binns <frank.binns@imgtec.com> # for Imagination
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> # for Sched
Reviewed-by: Maíra Canal <mcanal@igalia.com> # for v3d
Reviewed-by: Danilo Krummrich <dakr@kernel.org>
Reviewed-by: Lizhi Hou <lizhi.hou@amd.com> # for amdxdna
Signed-off-by: Philipp Stanner <phasta@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250211111422.21235-2-phasta@kernel.org
|
|
If the parent is NULL, adev->pdev is used to retrieve the PCIe speed and
width, ensuring that the function can still determine these
capabilities from the device itself.
Fixes the below:
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:6193 amdgpu_device_gpu_bandwidth()
error: we previously assumed 'parent' could be null (see line 6180)
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
6170 static void amdgpu_device_gpu_bandwidth(struct amdgpu_device *adev,
6171 enum pci_bus_speed *speed,
6172 enum pcie_link_width *width)
6173 {
6174 struct pci_dev *parent = adev->pdev;
6175
6176 if (!speed || !width)
6177 return;
6178
6179 parent = pci_upstream_bridge(parent);
6180 if (parent && parent->vendor == PCI_VENDOR_ID_ATI) {
^^^^^^
If parent is NULL
6181 /* use the upstream/downstream switches internal to dGPU */
6182 *speed = pcie_get_speed_cap(parent);
6183 *width = pcie_get_width_cap(parent);
6184 while ((parent = pci_upstream_bridge(parent))) {
6185 if (parent->vendor == PCI_VENDOR_ID_ATI) {
6186 /* use the upstream/downstream switches internal to dGPU */
6187 *speed = pcie_get_speed_cap(parent);
6188 *width = pcie_get_width_cap(parent);
6189 }
6190 }
6191 } else {
6192 /* use the device itself */
--> 6193 *speed = pcie_get_speed_cap(parent);
^^^^^^ Then we are toasted here.
6194 *width = pcie_get_width_cap(parent);
6195 }
6196 }
Fixes: 757e8b951ce2 ("drm/amdgpu: cache gpu pcie link width")
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Get the PCIe link with of the device itself (or it's
integrated upstream bridge) and cache that.
v2: fix typo
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3820
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
'add ip block' causes a confusion if the blocks are disabled later with
ip_block_mask. Instead change to 'detected' and also add device context.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
If the kernel hasn't been compiled with PCIe hotplug support this
can lead to problems with dGPUs that use BOCO because they effectively
drop off the bus.
To prevent issues, disable BOCO support when compiled without PCIe hotplug.
Reported-by: Gabriel Marcano <gabemarcano@yahoo.com>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1707#note_2696862
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20241211155601.3585256-1-superm1@kernel.org
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
VF device sets the RAS flag when mailbox data can't be read properly.
There is no conclusive way to tell if the real source is RAS error.
Therefore VF schedules a KFD based reset which doesn't set RAS source.
SKip checking RAS source for any VF scheduled recovery.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reported-by: Vojislav Tomasevic <vojislav.tomasevic@amd.com>
Reviewed-by: Yiqing Yao <yiqing.yao@amd.com>
Tested-by: Yiqing Yao <yiqing.yao@amd.com>
Fixes: e1ee2111ca48 ("drm/amdgpu: Prefer RAS recovery for scheduler hang")
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Some of the firmware that is loaded by amdgpu is not actually required.
For example the ISP firmware on some SoCs is optional, and if it's not
present the ISP IP block just won't be initialized.
The firmware loader core however will show a warning when this happens
like this:
```
Direct firmware load for amdgpu/isp_4_1_0.bin failed with error -2
```
To avoid confusion for non-required firmware, adjust the amd-ucode helper
to take an extra argument indicating if the firmware is required or
optional.
On optional firmware use firmware_request_nowarn() instead of
request_firmware() to avoid the warnings.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/amd-gfx/df71d375-7abd-4b32-97ce-15e57846eed8@amd.com/T/#t
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
add gfx950 basic support
Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Make spelling and punctuation changes to ease reading of the comments.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Xinhui Pan <Xinhui.Pan@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
As part of the suspend sequence VRAM needs to be evicted on dGPUs.
In order to make suspend/resume more reliable we moved this into
the pmops prepare() callback so that the suspend sequence would fail
but the system could remain operational under high memory usage suspend.
Another class of issues exist though where due to memory fragementation
there isn't a large enough contiguous space and swap isn't accessible.
Add support for a suspend/hibernate notification callback that could
evict VRAM before tasks are frozen. This should allow paging out to swap
if necessary.
Link: https://github.com/ROCm/ROCK-Kernel-Driver/issues/174
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3476
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2362
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3781
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Link: https://lore.kernel.org/r/20241128032656.2090059-2-superm1@kernel.org
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|