Age | Commit message (Collapse) | Author | Files | Lines |
|
During process kill, drm_sched_entity_flush() will kill the vm
entities. The following job submissions of this process will fail, and
the resources of these jobs have not been released, nor have the fences
been signalled, causing tasks to hang and timeout.
Fix by check entity status in amdgpu_vm_ready() and avoid submit jobs to
stopped entity.
v2: add amdgpu_vm_ready() check before amdgpu_vm_clear_freed() in
function amdgpu_cs_vm_handling().
Fixes: 1f02f2044bda ("drm/amdgpu: Avoid extra evict-restore process.")
Signed-off-by: Liu01 Tong <Tong.Liu01@amd.com>
Signed-off-by: Lin.Cao <lincao12@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit f101c13a8720c73e67f8f9d511fbbeda95bcedb1)
|
|
It should use vm flags instead of pte flags
to specify bo vm attributes.
Fixes: 7946340fa389 ("drm/amdgpu: Move csa related code to separate file")
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit b08425fa77ad2f305fe57a33dceb456be03b653f)
|
|
The vram block allocation flag must be cleared
before making vram reservation, otherwise reserving
addresses within the currently freed memory range
will always fail.
Fixes: c9cad937c0c5 ("drm/amdgpu: add drm buddy support to amdgpu")
Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit d38eaf27de1b8584f42d6fb3f717b7ec44b3a7a1)
|
|
The fw reserved GFX command is only supported starting from PSP fw
version 0x3a0e14 and 0x3b0e0d. Older versions do not support this command.
Add a version guard to ensure the command is only used when the running
PSP fw meets the minimum version requirement.
This ensures backward compatibility and safe operation across fw
revisions.
Fixes: a3b7f9c306e1 ("drm/amdgpu: reclaim psp fw reservation memory region")
Signed-off-by: Frank Min <Frank.Min@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 065e23170a1e09bc9104b761183e59562a029619)
|
|
Pull drm fixes from Dave Airlie:
"This is the fixes that built up in the merge window, mostly amdgpu and
xe with one i915 display fix, seems like things are pretty good for
rc1.
i915:
- DP LPFS fixes
xe:
- SRIOV: PF fixes and removal of need of module param
- Fix driver unbind around Devcoredump
- Mark xe driver as BROKEN if kernel page size is not 4kB
amdgpu:
- GC 9.5.0 fixes
- SMU fix
- DCE 6 DC fixes
- mmhub client ID fixes
- VRR fix
- Backlight fix
- UserQ fix
- Legacy reset fix
- Misc fixes
amdkfd:
- CRIU fix
- Debugfs fix"
* tag 'drm-next-2025-08-08' of https://gitlab.freedesktop.org/drm/kernel: (28 commits)
drm/amdgpu: add missing vram lost check for LEGACY RESET
drm/amdgpu/discovery: fix fw based ip discovery
drm/amdkfd: Destroy KFD debugfs after destroy KFD wq
amdgpu/amdgpu_discovery: increase timeout limit for IFWI init
drm/amdgpu: Update SDMA firmware version check for user queue support
drm/amdgpu: Add NULL check for asic_funcs
drm/amd/display: Revert "drm/amd/display: Fix AMDGPU_MAX_BL_LEVEL value"
drm/amd/display: fix a Null pointer dereference vulnerability
drm/amd/display: Add primary plane to commits for correct VRR handling
drm/amdgpu: update mmhub 3.3 client id mappings
drm/amdgpu: update mmhub 3.0.1 client id mappings
drm/amdgpu: Retain job->vm in amdgpu_job_prepare_job
drm/amd/display: Fix DCE 6.0 and 6.4 PLL programming.
drm/amd/display: Don't overwrite dce60_clk_mgr
drm/amdkfd: Fix checkpoint-restore on multi-xcc
drm/amd: Restore cached manual clock settings during resume
drm/amd: Restore cached power limit during resume
drm/amdgpu: Update external revid for GC v9.5.0
drm/amdgpu: Update supported modes for GC v9.5.0
Mark xe driver as BROKEN if kernel page size is not 4kB
...
|
|
Legacy resets reset the memory controllers so VRAM contents
may be unreliable after reset.
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit aae94897b6661a2a4b1de2d328090fc388b3e0af)
Cc: stable@vger.kernel.org
|
|
We only need the fw based discovery table for sysfs. No
need to parse it. Additionally parsing some of the board
specific tables may result in incorrect data on some boards.
just load the binary and don't parse it on those boards.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4441
Fixes: 80a0e8282933 ("drm/amdgpu/discovery: optionally use fw based ip discovery")
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 62eedd150fa11aefc2d377fc746633fdb1baeb55)
Cc: stable@vger.kernel.org
|
|
With a timeout of only 1 second, my rx 5700XT fails to initialize,
so this increases the timeout to 2s.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3697
Signed-off-by: Xaver Hugl <xaver.hugl@kde.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 9ed3d7bdf2dcdf1a1196630fab89a124526e9cc2)
Cc: stable@vger.kernel.org
|
|
This commit fixes a firmware version check for enabling user queue
support in SDMA v7.0. The previous version check (7836028) was
incorrect and could lead to issues with PROTECTED_FENCE_SIGNAL
commands causing register conflicts between MCU_DBG0 and MCU_DBG1.
Fixes: 8c011408ed84 ("drm/amdgpu/sdma7: add ucode version checks for userq support")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 92e2449241516c95aab95eea91faecd0fa2b7ed5)
Cc: stable@vger.kernel.org
|
|
If driver load fails too early, asic_funcs pointer remains unassigned.
Add NULL check to sanitize unwind path.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 582bf7c5158dce16f7dc5b8345b7876bd8031224)
Cc: stable@vger.kernel.org
|
|
Update the client id mapping so the correct clients
get printed when there is a mmhub page fault.
v2: fix typos spotted by David Wu.
v3: fix additional typo spotted by David.
Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit e932f4779a2d329841bb9ca70bb80a4bb2d707b6)
Cc: stable@vger.kernel.org
|
|
Update the client id mapping so the correct clients
get printed when there is a mmhub page fault.
Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 2a2681eda73b99a2c1ee8cdb006099ea5d0c2505)
Cc: stable@vger.kernel.org
|
|
The field job->vm is used in function amdgpu_job_run to get the page
table re-generation counter and decide whether the job should be skipped.
Specifically, function amdgpu_vm_generation checks if the VM is valid for this job to use.
For instance, if a gfx job depends on a cancelled sdma job from entity vm->delayed,
then the gfx job should be skipped.
Fixes: 26c95e838e63 ("drm/amdgpu: set the VM pointer to NULL in amdgpu_job_prepare")
Signed-off-by: YuanShang <YuanShang.Mao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit ed76936c6b10b547c6df4ca75412331e9ef6d339)
Cc: stable@vger.kernel.org
|
|
Use different external revid for GC v9.5.0 SOCs.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 21c6764ed4bfaecad034bc4fd15dd64c5a436325)
Cc: stable@vger.kernel.org
|
|
For GC v9.5.0 SOCs, both CPX and QPX compute modes are also supported in
NPS2 mode.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Mangesh Gadre <Mangesh.Gadre@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 9d1ac25c7f830e0132aa816393b1e9f140e71148)
Cc: stable@vger.kernel.org
|
|
Pull drm fixes from Dave Airlie:
"Just a bunch of amdgpu and xe fixes.
amdgpu:
- DSC divide by 0 fix
- clang fix
- DC debugfs fix
- Userq fixes
- Avoid extra evict-restore with KFD
- Backlight fix
- Documentation fix
- RAS fix
- Add new kicker handling
- DSC fix for DCN 3.1.4
- PSR fix
- Atomic fix
- DC reset fixes
- DCN 3.0.1 fix
- MMHUB client mapping fix
xe:
- Fix BMG probe on unsupported mailbox command
- Fix OA static checker warning about null gt
- Fix a NULL vs IS_ERR() bug in xe_i2c_register_adapter
- Fix missing unwind goto in GuC/HuC
- Don't register I2C devices if VF
- Clear whole GuC g2h_fence during initialization
- Avoid call kfree for drmm_kzalloc
- Fix pci_dev reference leak on configfs
- SRIOV: Disable CSC support on VF
* tag 'drm-next-2025-08-01' of https://gitlab.freedesktop.org/drm/kernel: (24 commits)
drm/xe/vf: Disable CSC support on VF
drm/amdgpu: update mmhub 4.1.0 client id mappings
drm/amd/display: Allow DCN301 to clear update flags
drm/amd/display: Pass up errors for reset GPU that fails to init HW
drm/amd/display: Only finalize atomic_obj if it was initialized
drm/amd/display: Avoid configuring PSR granularity if PSR-SU not supported
drm/amd/display: Disable dsc_power_gate for dcn314 by default
drm/amdgpu: add kicker fws loading for gfx12/smu14/psp14
drm/amd/amdgpu: fix missing lock for cper.ring->rptr/wptr access
drm/amd/display: Fix misuse of /** to /* in 'dce_i2c_hw.c'
drm/amd/display: fix initial backlight brightness calculation
drm/amdgpu: Avoid extra evict-restore process.
drm/amdgpu: track whether a queue is a kernel queue in amdgpu_mqd_prop
drm/amdgpu: check if hubbub is NULL in debugfs/amdgpu_dm_capabilities
drm/amdgpu: Initialize data to NULL in imu_v12_0_program_rlc_ram()
drm/amd/display: Fix divide by zero when calculating min ODM factor
drm/xe/configfs: Fix pci_dev reference leak
drm/xe/hw_engine_group: Avoid call kfree() for drmm_kzalloc()
drm/xe/guc: Clear whole g2h_fence during initialization
drm/xe/vf: Don't register I2C devices if VF
...
|
|
Pull drm updates from Dave Airlie:
"Highlights:
- Intel xe enable Panthor Lake, started adding WildCat Lake
- amdgpu has a bunch of reset improvments along with the usual IP
updates
- msm got VM_BIND support which is important for vulkan sparse memory
- more drm_panic users
- gpusvm common code to handle a bunch of core SVM work outside
drivers.
Detail summary:
Changes outside drm subdirectory:
- 'shrink_shmem_memory()' for better shmem/hibernate interaction
- Rust support infrastructure:
- make ETIMEDOUT available
- add size constants up to SZ_2G
- add DMA coherent allocation bindings
- mtd driver for Intel GPU non-volatile storage
- i2c designware quirk for Intel xe
core:
- atomic helpers: tune enable/disable sequences
- add task info to wedge API
- refactor EDID quirks
- connector: move HDR sink to drm_display_info
- fourcc: half-float and 32-bit float formats
- mode_config: pass format info to simplify
dma-buf:
- heaps: Give CMA heap a stable name
ci:
- add device tree validation and kunit
displayport:
- change AUX DPCD access probe address
- add quirk for DPCD probe
- add panel replay definitions
- backlight control helpers
fbdev:
- make CONFIG_FIRMWARE_EDID available on all arches
fence:
- fix UAF issues
format-helper:
- improve tests
gpusvm:
- introduce devmem only flag for allocation
- add timeslicing support to GPU SVM
ttm:
- improve eviction
sched:
- tracing improvements
- kunit improvements
- memory leak fixes
- reset handling improvements
color mgmt:
- add hardware gamma LUT handling helpers
bridge:
- add destroy hook
- switch to reference counted drm_bridge allocations
- tc358767: convert to devm_drm_bridge_alloc
- improve CEC handling
panel:
- switch to reference counter drm_panel allocations
- fwnode panel lookup
- Huiling hl055fhv028c support
- Raspberry Pi 7" 720x1280 support
- edp: KDC KD116N3730A05, N160JCE-ELL CMN, N116BCJ-EAK
- simple: AUO P238HAN01
- st7701: Winstar wf40eswaa6mnn0
- visionox: rm69299-shift
- Renesas R61307, Renesas R69328 support
- DJN HX83112B
hdmi:
- add CEC handling
- YUV420 output support
xe:
- WildCat Lake support
- Enable PanthorLake by default
- mark BMG as SRIOV capable
- update firmware recommendations
- Expose media OA units
- aux-bux support for non-volatile memory
- MTD intel-dg driver for non-volatile memory
- Expose fan control and voltage regulator in sysfs
- restructure migration for multi-device
- Restore GuC submit UAF fix
- make GEM shrinker drm managed
- SRIOV VF Post-migration recovery of GGTT nodes
- W/A additions/reworks
- Prefetch support for svm ranges
- Don't allocate managed BO for each policy change
- HWMON fixes for BMG
- Create LRC BO without VM
- PCI ID updates
- make SLPC debugfs files optional
- rework eviction rejection of bound external BOs
- consolidate PAT programming logic for pre/post Xe2
- init changes for flicker-free boot
- Enable GuC Dynamic Inhibit Context switch
i915:
- drm_panic support for i915/xe
- initial flip queue off by default for LNL/PNL
- Wildcat Lake Display support
- Support for DSC fractional link bpp
- Support for simultaneous Panel Replay and Adaptive sync
- Support for PTL+ double buffer LUT
- initial PIPEDMC event handling
- drm_panel_follower support
- DPLL interface renames
- allocate struct intel_display dynamically
- flip queue preperation
- abstract DRAM detection better
- avoid GuC scheduling stalls
- remove DG1 force probe requirement
- fix MEI interrupt handler on RT kernels
- use backlight control helpers for eDP
- more shared display code refactoring
amdgpu:
- add userq slot to INFO ioctl
- SR-IOV hibernation support
- Suspend improvements
- Backlight improvements
- Use scaling for non-native eDP modes
- cleaner shader updates for GC 9.x
- Remove fence slab
- SDMA fw checks for userq support
- RAS updates
- DMCUB updates
- DP tunneling fixes
- Display idle D3 support
- Per queue reset improvements
- initial smartmux support
amdkfd:
- enable KFD on loongarch
- mtype fix for ext coherent system memory
radeon:
- CS validation additional GL extensions
- drop console lock during suspend/resume
- bump driver version
msm:
- VM BIND support
- CI: infrastructure updates
- UBWC single source of truth
- decouple GPU and KMS support
- DP: rework I/O accessors
- DPU: SM8750 support
- DSI: SM8750 support
- GPU: X1-45 support and speedbin support for X1-85
- MDSS: SM8750 support
nova:
- register! macro improvements
- DMA object abstraction
- VBIOS parser + fwsec lookup
- sysmem flush page support
- falcon: generic falcon boot code and HAL
- FWSEC-FRTS: fb setup and load/execute
ivpu:
- Add Wildcat Lake support
- Add turbo flag
ast:
- improve hardware generations implementation
imx:
- IMX8qxq Display Controller support
lima:
- Rockchip RK3528 GPU support
nouveau:
- fence handling cleanup
panfrost:
- MT8370 support
- bo labeling
- 64-bit register access
qaic:
- add RAS support
rockchip:
- convert inno_hdmi to a bridge
rz-du:
- add RZ/V2H(P) support
- MIPI-DSI DCS support
sitronix:
- ST7567 support
sun4i:
- add H616 support
tidss:
- add TI AM62L support
- AM65x OLDI bridge support
bochs:
- drm panic support
vkms:
- YUV and R* format support
- use faux device
vmwgfx:
- fence improvements
hyperv:
- move out of simple
- add drm_panic support"
* tag 'drm-next-2025-07-30' of https://gitlab.freedesktop.org/drm/kernel: (1479 commits)
drm/tidss: oldi: convert to devm_drm_bridge_alloc() API
drm/tidss: encoder: convert to devm_drm_bridge_alloc()
drm/amdgpu: move reset support type checks into the caller
drm/amdgpu/sdma7: re-emit unprocessed state on ring reset
drm/amdgpu/sdma6: re-emit unprocessed state on ring reset
drm/amdgpu/sdma5.2: re-emit unprocessed state on ring reset
drm/amdgpu/sdma5: re-emit unprocessed state on ring reset
drm/amdgpu/gfx12: re-emit unprocessed state on ring reset
drm/amdgpu/gfx11: re-emit unprocessed state on ring reset
drm/amdgpu/gfx10: re-emit unprocessed state on ring reset
drm/amdgpu/gfx9.4.3: re-emit unprocessed state on kcq reset
drm/amdgpu/gfx9: re-emit unprocessed state on kcq reset
drm/amdgpu: Add WARN_ON to the resource clear function
drm/amd/pm: Use cached metrics data on SMUv13.0.6
drm/amd/pm: Use cached data for min/max clocks
gpu: nova-core: fix bounds check in PmuLookupTableEntry::new
drm/amdgpu: Replace HQD terminology with slots naming
drm/amdgpu: Add user queue instance count in HW IP info
drm/amd/amdgpu: Add helper functions for isp buffers
drm/amd/amdgpu: Initialize swnode for ISP MFD device
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core
Pull driver core updates from Danilo Krummrich:
"debugfs:
- Remove unneeded debugfs_file_{get,put}() instances
- Remove last remnants of debugfs_real_fops()
- Allow storing non-const void * in struct debugfs_inode_info::aux
sysfs:
- Switch back to attribute_group::bin_attrs (treewide)
- Switch back to bin_attribute::read()/write() (treewide)
- Constify internal references to 'struct bin_attribute'
Support cache-ids for device-tree systems:
- Add arch hook arch_compact_of_hwid()
- Use arch_compact_of_hwid() to compact MPIDR values on arm64
Rust:
- Device:
- Introduce CoreInternal device context (for bus internal methods)
- Provide generic drvdata accessors for bus devices
- Provide Driver::unbind() callbacks
- Use the infrastructure above for auxiliary, PCI and platform
- Implement Device::as_bound()
- Rename Device::as_ref() to Device::from_raw() (treewide)
- Implement fwnode and device property abstractions
- Implement example usage in the Rust platform sample driver
- Devres:
- Remove the inner reference count (Arc) and use pin-init instead
- Replace Devres::new_foreign_owned() with devres::register()
- Require T to be Send in Devres<T>
- Initialize the data kept inside a Devres last
- Provide an accessor for the Devres associated Device
- Device ID:
- Add support for ACPI device IDs and driver match tables
- Split up generic device ID infrastructure
- Use generic device ID infrastructure in net::phy
- DMA:
- Implement the dma::Device trait
- Add DMA mask accessors to dma::Device
- Implement dma::Device for PCI and platform devices
- Use DMA masks from the DMA sample module
- I/O:
- Implement abstraction for resource regions (struct resource)
- Implement resource-based ioremap() abstractions
- Provide platform device accessors for I/O (remap) requests
- Misc:
- Support fallible PinInit types in Revocable
- Implement Wrapper<T> for Opaque<T>
- Merge pin-init blanket dependencies (for Devres)
Misc:
- Fix OF node leak in auxiliary_device_create()
- Use util macros in device property iterators
- Improve kobject sample code
- Add device_link_test() for testing device link flags
- Fix typo in Documentation/ABI/testing/sysfs-kernel-address_bits
- Hint to prefer container_of_const() over container_of()"
* tag 'driver-core-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core: (84 commits)
rust: io: fix broken intra-doc links to `platform::Device`
rust: io: fix broken intra-doc link to missing `flags` module
rust: io: mem: enable IoRequest doc-tests
rust: platform: add resource accessors
rust: io: mem: add a generic iomem abstraction
rust: io: add resource abstraction
rust: samples: dma: set DMA mask
rust: platform: implement the `dma::Device` trait
rust: pci: implement the `dma::Device` trait
rust: dma: add DMA addressing capabilities
rust: dma: implement `dma::Device` trait
rust: net::phy Change module_phy_driver macro to use module_device_table macro
rust: net::phy represent DeviceId as transparent wrapper over mdio_device_id
rust: device_id: split out index support into a separate trait
device: rust: rename Device::as_ref() to Device::from_raw()
arm64: cacheinfo: Provide helper to compress MPIDR value into u32
cacheinfo: Add arch hook to compress CPU h/w id into 32 bits for cache-id
cacheinfo: Set cache 'id' based on DT data
container_of: Document container_of() is not to be used in new code
driver core: auxiliary bus: fix OF node leak
...
|
|
Update the client id mapping so the correct clients
get printed when there is a mmhub page fault.
Tested-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
1. Add kicker firmwares loading for gfx12/smu14/psp14
2. Register additional MODULE_FIRMWARE entries for kicker fws
- gc_12_0_1_rlc_kicker.bin
- gc_12_0_1_imu_kicker.bin
- psp_14_0_3_sos_kicker.bin
- psp_14_0_3_ta_kicker.bin
- smu_14_0_3_kicker.bin
Signed-off-by: Frank Min <Frank.Min@amd.com>
Reviewed-by: Gui Chengming <Jack.Gui@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
Add lock protection for 'ring->wptr'/'ring->rptr' to ensure the correct execution.
Fixes: 8652920d2c00 ("drm/amdgpu: add mutex lock for cper ring")
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
If vm belongs to another process, this is fclose after fork,
wait may enable signaling KFD eviction fence and cause parent process queue evicted.
[677852.634569] amdkfd_fence_enable_signaling+0x56/0x70 [amdgpu]
[677852.634814] __dma_fence_enable_signaling+0x3e/0xe0
[677852.634820] dma_fence_wait_timeout+0x3a/0x140
[677852.634825] amddma_resv_wait_timeout+0x7f/0xf0 [amdkcl]
[677852.634831] amdgpu_vm_wait_idle+0x2d/0x60 [amdgpu]
[677852.635026] amdgpu_flush+0x34/0x50 [amdgpu]
[677852.635208] filp_flush+0x38/0x90
[677852.635213] filp_close+0x14/0x30
[677852.635216] do_close_on_exec+0xdd/0x130
[677852.635221] begin_new_exec+0x1da/0x490
[677852.635225] load_elf_binary+0x307/0xea0
[677852.635231] ? srso_alias_return_thunk+0x5/0xfbef5
[677852.635235] ? ima_bprm_check+0xa2/0xd0
[677852.635240] search_binary_handler+0xda/0x260
[677852.635245] exec_binprm+0x58/0x1a0
[677852.635249] bprm_execve.part.0+0x16f/0x210
[677852.635254] bprm_execve+0x45/0x80
[677852.635257] do_execveat_common.isra.0+0x190/0x200
Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Gang Ba <Gang.Ba@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
Used to to set the MQD appropriately for each queue type.
Kernel queues have additional privileges.
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.16.x
|
|
After a recent change in clang to expose uninitialized warnings from
const variables and pointers [1], there is a warning in
imu_v12_0_program_rlc_ram() because data is passed uninitialized to
program_imu_rlc_ram():
drivers/gpu/drm/amd/amdgpu/imu_v12_0.c:374:30: error: variable 'data' is uninitialized when used here [-Werror,-Wuninitialized]
374 | program_imu_rlc_ram(adev, data, (const u32)size);
| ^~~~
As this warning happens early in clang's frontend, it does not realize
that due to the assignment of r to -EINVAL, program_imu_rlc_ram() is
never actually called, and even if it were, data would not be
dereferenced because size is 0.
Just initialize data to NULL to silence the warning, as the commit that
added program_imu_rlc_ram() mentioned it would eventually be used over
the old method, at which point data can be properly initialized and
used.
Cc: stable@vger.kernel.org
Closes: https://github.com/ClangBuiltLinux/linux/issues/2107
Fixes: 56159fffaab5 ("drm/amdgpu: use new method to program rlc ram")
Link: https://github.com/llvm/llvm-project/commit/2464313eef01c5b1edf0eccf57a32cdee01472c7 [1]
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
drm-misc-fixes for v6.16-rc8/final?:
- Revert all uses of drm_gem_object->dmabuf to
drm_gem_object->import_attach->dmabuf.
- Fix amdgpu returning BIOS cluttered VRAM after resume.
- Scheduler hang fix.
- Revert nouveau ioctl fix as it caused regressions.
- Fix null pointer deref in nouveau.
- Fix unnecessary semicolon in ti_sn_bridge_probe.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://lore.kernel.org/r/72235afd-c849-49fe-9cc1-2b1781abdf08@linux.intel.com
|
|
https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-6.17-2025-07-17:
amdgpu:
- Partition fixes
- Reset fixes
- RAS fixes
- i2c fix
- MPC updates
- DSC cleanup
- EDID fixes
- Display idle D3 update
- IPS updates
- DMUB updates
- Retimer fix
- Replay fixes
- Fix DC memory leak
- Initial support for smartmux
- DCN 4.0.1 degamma LUT fix
- Per queue reset cleanups
- Track ring state associated with a fence
- SR-IOV fixes
- SMU fixes
- Per queue reset improvements for GC 9+ compute
- Per queue reset improvements for GC 10+ gfx
- Per queue reset improvements for SDMA 5+
- Per queue reset improvements for JPEG 2+
- Per queue reset improvements for VCN 2+
- GC 8 fix
- ISP updates
amdkfd:
- Enable KFD on LoongArch
radeon:
- Drop console lock during suspend/resume
UAPI:
- Add userq slot info to INFO IOCTL
Used for IGT userq validation tests (https://lists.freedesktop.org/archives/igt-dev/2025-July/093228.html)
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250717213827.2061581-1-alexander.deucher@amd.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
drm-misc-next for 6.17:
UAPI Changes:
Cross-subsystem Changes:
Core Changes:
- mode_config: Change fb_create prototype to pass the drm_format_info
and avoid redundant lookups in drivers
- sched: kunit improvements, memory leak fixes, reset handling
improvements
- tests: kunit EDID update
Driver Changes:
- amdgpu: Hibernation fixes, structure lifetime fixes
- nouveau: sched improvements
- sitronix: Add Sitronix ST7567 Support
- bridge:
- Make connector available to bridge detect hook
- panel:
- More refcounting changes
- New panels: BOE NE14QDM
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <mripard@redhat.com>
Link: https://lore.kernel.org/r/20250717-efficient-kudu-of-fantasy-ff95e0@houat
|
|
Rather than checking in the callbacks, check if the reset
type is supported in the caller.
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Re-emit the unprocessed state after resetting the queue.
Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Re-emit the unprocessed state after resetting the queue.
Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Re-emit the unprocessed state after resetting the queue.
Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Re-emit the unprocessed state after resetting the queue.
Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Re-emit the unprocessed state after resetting the queue.
Drop the soft_recovery callbacks as the queue reset replaces
it.
Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Re-emit the unprocessed state after resetting the queue.
Drop the soft_recovery callbacks as the queue reset replaces
it.
Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Re-emit the unprocessed state after resetting the queue.
Drop the soft_recovery callbacks as the queue reset replaces
it.
Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Re-emit the unprocessed state after resetting the queue.
Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Re-emit the unprocessed state after resetting the queue.
Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Set the dirty bit when the memory resource is not cleared
during BO release.
v2(Christian):
- Drop the cleared flag set to false.
- Improve the amdgpu_vram_mgr_set_clear_state() function.
v3:
- Add back the resource clear flag set function call after
being cleared during eviction (Christian).
- Modified the patch subject name.
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Commit 42cdf6f687da ("drm/amdgpu/gfx8: always restore kcq MQDs") made the
ring pointer always to be reset on resume from suspend. This caused compute
rings to fail since the reset was done without also resetting it for the
firmware. Reset wptr on the GPU to avoid a disconnect between the driver
and firmware wptr.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3911
Fixes: 42cdf6f687da ("drm/amdgpu/gfx8: always restore kcq MQDs")
Signed-off-by: Eeli Haapalainen <eeli.haapalainen@protonmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 2becafc319db3d96205320f31cc0de4ee5a93747)
Cc: stable@vger.kernel.org
|
|
Increment the reset counter only if soft recovery succeeded. This is
consistent with a ring hard reset behaviour where counter gets
incremented only if hard reset succeeded.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 25c314aa3ec3d30e4ee282540e2096b5c66a2437)
Cc: stable@vger.kernel.org
|
|
The term "HQD" is CP-specific and doesn't
accurately describe the queue resources for other IP blocks like SDMA,
VCN, or VPE. This change:
1. Renames `num_hqds` to `num_slots` in amdgpu_kms.c to better reflect
the generic nature of the resource counting
2. Updates the UAPI struct member from `userq_num_hqds` to `userq_num_slots`
3. Maintains the same functionality while using more appropriate terminology
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This change exposes the number of available user queue instances
for each hardware IP type (GFX, COMPUTE, SDMA) through the
drm_amdgpu_info_hw_ip interface.
Key changes:
1. Added userq_num_instance field to drm_amdgpu_info_hw_ip structure
2. Implemented counting of available HQD slots using:
- mes.gfx_hqd_mask for GFX queues
- mes.compute_hqd_mask for COMPUTE queues
- mes.sdma_hqd_mask for SDMA queues
3. Only counts available instances when user queues are enabled
(!disable_uq)
v2: using the adev->mes.gfx_hqd_mask[]/compute_hqd_mask[]/sdma_hqd_mask[] masks
to determine the number of queue slots available for each engine type (Alex)
v3: rename userq_num_instance to userq_num_hqds (Alex)
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Accessing amdgpu internal data structures "struct amdgpu_device"
and "struct amdgpu_bo" in ISP V4L2 driver to alloc/free GART
buffers is not recommended.
Add new amdgpu_isp helper functions that takes opaque params
from ISP V4L2 driver and calls the amdgpu internal functions
amdgpu_bo_create_isp_user() and amdgpu_bo_create_kernel() to
alloc/free GART buffers.
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Create amd_isp_capture MFD device with swnode initialized to
isp specific software_node part of fwnode graph in amd_isp4
x86/platform driver. The isp driver use this swnode handle
to retrieve the critical properties (data-lanes, mipi phyid,
link-frequencies etc.) required for camera to work on AMD ISP4
based targets.
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Commit 42cdf6f687da ("drm/amdgpu/gfx8: always restore kcq MQDs") made the
ring pointer always to be reset on resume from suspend. This caused compute
rings to fail since the reset was done without also resetting it for the
firmware. Reset wptr on the GPU to avoid a disconnect between the driver
and firmware wptr.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3911
Fixes: 42cdf6f687da ("drm/amdgpu/gfx8: always restore kcq MQDs")
Signed-off-by: Eeli Haapalainen <eeli.haapalainen@protonmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Make the handling consistent with other IPs and across
JPEG versions.
Reviewed-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Avoid using the mapping here.
v2: use amdgpu_xgmi_same_hive() as suggested by Felix
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Use the new helpers to handle engine resets for VCN.
Reviewed-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Tested-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Use the new helpers to handle engine resets for VCN.
Reviewed-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Tested-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Use the new helpers to handle engine resets for VCN.
Reviewed-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Tested-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|