summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)AuthorFilesLines
7 daysdrm/amdgpu: fix task hang from failed job submission during process killLiu01 Tong2-4/+14
During process kill, drm_sched_entity_flush() will kill the vm entities. The following job submissions of this process will fail, and the resources of these jobs have not been released, nor have the fences been signalled, causing tasks to hang and timeout. Fix by check entity status in amdgpu_vm_ready() and avoid submit jobs to stopped entity. v2: add amdgpu_vm_ready() check before amdgpu_vm_clear_freed() in function amdgpu_cs_vm_handling(). Fixes: 1f02f2044bda ("drm/amdgpu: Avoid extra evict-restore process.") Signed-off-by: Liu01 Tong <Tong.Liu01@amd.com> Signed-off-by: Lin.Cao <lincao12@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit f101c13a8720c73e67f8f9d511fbbeda95bcedb1)
7 daysdrm/amdgpu: fix incorrect vm flags to map boJack Xiao1-2/+2
It should use vm flags instead of pte flags to specify bo vm attributes. Fixes: 7946340fa389 ("drm/amdgpu: Move csa related code to separate file") Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit b08425fa77ad2f305fe57a33dceb456be03b653f)
7 daysdrm/amdgpu: fix vram reservation issueYiPeng Chai1-2/+1
The vram block allocation flag must be cleared before making vram reservation, otherwise reserving addresses within the currently freed memory range will always fail. Fixes: c9cad937c0c5 ("drm/amdgpu: add drm buddy support to amdgpu") Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit d38eaf27de1b8584f42d6fb3f717b7ec44b3a7a1)
7 daysdrm/amdgpu: Add PSP fw version check for fw reserve GFX commandFrank Min1-3/+16
The fw reserved GFX command is only supported starting from PSP fw version 0x3a0e14 and 0x3b0e0d. Older versions do not support this command. Add a version guard to ensure the command is only used when the running PSP fw meets the minimum version requirement. This ensures backward compatibility and safe operation across fw revisions. Fixes: a3b7f9c306e1 ("drm/amdgpu: reclaim psp fw reservation memory region") Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 065e23170a1e09bc9104b761183e59562a029619)
12 daysMerge tag 'drm-next-2025-08-08' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds9-74/+189
Pull drm fixes from Dave Airlie: "This is the fixes that built up in the merge window, mostly amdgpu and xe with one i915 display fix, seems like things are pretty good for rc1. i915: - DP LPFS fixes xe: - SRIOV: PF fixes and removal of need of module param - Fix driver unbind around Devcoredump - Mark xe driver as BROKEN if kernel page size is not 4kB amdgpu: - GC 9.5.0 fixes - SMU fix - DCE 6 DC fixes - mmhub client ID fixes - VRR fix - Backlight fix - UserQ fix - Legacy reset fix - Misc fixes amdkfd: - CRIU fix - Debugfs fix" * tag 'drm-next-2025-08-08' of https://gitlab.freedesktop.org/drm/kernel: (28 commits) drm/amdgpu: add missing vram lost check for LEGACY RESET drm/amdgpu/discovery: fix fw based ip discovery drm/amdkfd: Destroy KFD debugfs after destroy KFD wq amdgpu/amdgpu_discovery: increase timeout limit for IFWI init drm/amdgpu: Update SDMA firmware version check for user queue support drm/amdgpu: Add NULL check for asic_funcs drm/amd/display: Revert "drm/amd/display: Fix AMDGPU_MAX_BL_LEVEL value" drm/amd/display: fix a Null pointer dereference vulnerability drm/amd/display: Add primary plane to commits for correct VRR handling drm/amdgpu: update mmhub 3.3 client id mappings drm/amdgpu: update mmhub 3.0.1 client id mappings drm/amdgpu: Retain job->vm in amdgpu_job_prepare_job drm/amd/display: Fix DCE 6.0 and 6.4 PLL programming. drm/amd/display: Don't overwrite dce60_clk_mgr drm/amdkfd: Fix checkpoint-restore on multi-xcc drm/amd: Restore cached manual clock settings during resume drm/amd: Restore cached power limit during resume drm/amdgpu: Update external revid for GC v9.5.0 drm/amdgpu: Update supported modes for GC v9.5.0 Mark xe driver as BROKEN if kernel page size is not 4kB ...
13 daysdrm/amdgpu: add missing vram lost check for LEGACY RESETAlex Deucher1-0/+1
Legacy resets reset the memory controllers so VRAM contents may be unreliable after reset. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit aae94897b6661a2a4b1de2d328090fc388b3e0af) Cc: stable@vger.kernel.org
13 daysdrm/amdgpu/discovery: fix fw based ip discoveryAlex Deucher2-36/+41
We only need the fw based discovery table for sysfs. No need to parse it. Additionally parsing some of the board specific tables may result in incorrect data on some boards. just load the binary and don't parse it on those boards. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4441 Fixes: 80a0e8282933 ("drm/amdgpu/discovery: optionally use fw based ip discovery") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 62eedd150fa11aefc2d377fc746633fdb1baeb55) Cc: stable@vger.kernel.org
13 daysamdgpu/amdgpu_discovery: increase timeout limit for IFWI initXaver Hugl1-2/+2
With a timeout of only 1 second, my rx 5700XT fails to initialize, so this increases the timeout to 2s. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3697 Signed-off-by: Xaver Hugl <xaver.hugl@kde.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 9ed3d7bdf2dcdf1a1196630fab89a124526e9cc2) Cc: stable@vger.kernel.org
2025-08-04drm/amdgpu: Update SDMA firmware version check for user queue supportJesse.Zhang1-1/+1
This commit fixes a firmware version check for enabling user queue support in SDMA v7.0. The previous version check (7836028) was incorrect and could lead to issues with PROTECTED_FENCE_SIGNAL commands causing register conflicts between MCU_DBG0 and MCU_DBG1. Fixes: 8c011408ed84 ("drm/amdgpu/sdma7: add ucode version checks for userq support") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 92e2449241516c95aab95eea91faecd0fa2b7ed5) Cc: stable@vger.kernel.org
2025-08-04drm/amdgpu: Add NULL check for asic_funcsLijo Lazar1-1/+2
If driver load fails too early, asic_funcs pointer remains unassigned. Add NULL check to sanitize unwind path. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 582bf7c5158dce16f7dc5b8345b7876bd8031224) Cc: stable@vger.kernel.org
2025-08-04drm/amdgpu: update mmhub 3.3 client id mappingsAlex Deucher1-1/+104
Update the client id mapping so the correct clients get printed when there is a mmhub page fault. v2: fix typos spotted by David Wu. v3: fix additional typo spotted by David. Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit e932f4779a2d329841bb9ca70bb80a4bb2d707b6) Cc: stable@vger.kernel.org
2025-08-04drm/amdgpu: update mmhub 3.0.1 client id mappingsAlex Deucher1-25/+32
Update the client id mapping so the correct clients get printed when there is a mmhub page fault. Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 2a2681eda73b99a2c1ee8cdb006099ea5d0c2505) Cc: stable@vger.kernel.org
2025-08-04drm/amdgpu: Retain job->vm in amdgpu_job_prepare_jobYuanShang1-7/+0
The field job->vm is used in function amdgpu_job_run to get the page table re-generation counter and decide whether the job should be skipped. Specifically, function amdgpu_vm_generation checks if the VM is valid for this job to use. For instance, if a gfx job depends on a cancelled sdma job from entity vm->delayed, then the gfx job should be skipped. Fixes: 26c95e838e63 ("drm/amdgpu: set the VM pointer to NULL in amdgpu_job_prepare") Signed-off-by: YuanShang <YuanShang.Mao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit ed76936c6b10b547c6df4ca75412331e9ef6d339) Cc: stable@vger.kernel.org
2025-08-04drm/amdgpu: Update external revid for GC v9.5.0Lijo Lazar1-0/+2
Use different external revid for GC v9.5.0 SOCs. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 21c6764ed4bfaecad034bc4fd15dd64c5a436325) Cc: stable@vger.kernel.org
2025-08-04drm/amdgpu: Update supported modes for GC v9.5.0Lijo Lazar1-1/+4
For GC v9.5.0 SOCs, both CPX and QPX compute modes are also supported in NPS2 mode. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 9d1ac25c7f830e0132aa816393b1e9f140e71148) Cc: stable@vger.kernel.org
2025-08-01Merge tag 'drm-next-2025-08-01' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds9-35/+43
Pull drm fixes from Dave Airlie: "Just a bunch of amdgpu and xe fixes. amdgpu: - DSC divide by 0 fix - clang fix - DC debugfs fix - Userq fixes - Avoid extra evict-restore with KFD - Backlight fix - Documentation fix - RAS fix - Add new kicker handling - DSC fix for DCN 3.1.4 - PSR fix - Atomic fix - DC reset fixes - DCN 3.0.1 fix - MMHUB client mapping fix xe: - Fix BMG probe on unsupported mailbox command - Fix OA static checker warning about null gt - Fix a NULL vs IS_ERR() bug in xe_i2c_register_adapter - Fix missing unwind goto in GuC/HuC - Don't register I2C devices if VF - Clear whole GuC g2h_fence during initialization - Avoid call kfree for drmm_kzalloc - Fix pci_dev reference leak on configfs - SRIOV: Disable CSC support on VF * tag 'drm-next-2025-08-01' of https://gitlab.freedesktop.org/drm/kernel: (24 commits) drm/xe/vf: Disable CSC support on VF drm/amdgpu: update mmhub 4.1.0 client id mappings drm/amd/display: Allow DCN301 to clear update flags drm/amd/display: Pass up errors for reset GPU that fails to init HW drm/amd/display: Only finalize atomic_obj if it was initialized drm/amd/display: Avoid configuring PSR granularity if PSR-SU not supported drm/amd/display: Disable dsc_power_gate for dcn314 by default drm/amdgpu: add kicker fws loading for gfx12/smu14/psp14 drm/amd/amdgpu: fix missing lock for cper.ring->rptr/wptr access drm/amd/display: Fix misuse of /** to /* in 'dce_i2c_hw.c' drm/amd/display: fix initial backlight brightness calculation drm/amdgpu: Avoid extra evict-restore process. drm/amdgpu: track whether a queue is a kernel queue in amdgpu_mqd_prop drm/amdgpu: check if hubbub is NULL in debugfs/amdgpu_dm_capabilities drm/amdgpu: Initialize data to NULL in imu_v12_0_program_rlc_ram() drm/amd/display: Fix divide by zero when calculating min ODM factor drm/xe/configfs: Fix pci_dev reference leak drm/xe/hw_engine_group: Avoid call kfree() for drmm_kzalloc() drm/xe/guc: Clear whole g2h_fence during initialization drm/xe/vf: Don't register I2C devices if VF ...
2025-07-31Merge tag 'drm-next-2025-07-30' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds120-1870/+3194
Pull drm updates from Dave Airlie: "Highlights: - Intel xe enable Panthor Lake, started adding WildCat Lake - amdgpu has a bunch of reset improvments along with the usual IP updates - msm got VM_BIND support which is important for vulkan sparse memory - more drm_panic users - gpusvm common code to handle a bunch of core SVM work outside drivers. Detail summary: Changes outside drm subdirectory: - 'shrink_shmem_memory()' for better shmem/hibernate interaction - Rust support infrastructure: - make ETIMEDOUT available - add size constants up to SZ_2G - add DMA coherent allocation bindings - mtd driver for Intel GPU non-volatile storage - i2c designware quirk for Intel xe core: - atomic helpers: tune enable/disable sequences - add task info to wedge API - refactor EDID quirks - connector: move HDR sink to drm_display_info - fourcc: half-float and 32-bit float formats - mode_config: pass format info to simplify dma-buf: - heaps: Give CMA heap a stable name ci: - add device tree validation and kunit displayport: - change AUX DPCD access probe address - add quirk for DPCD probe - add panel replay definitions - backlight control helpers fbdev: - make CONFIG_FIRMWARE_EDID available on all arches fence: - fix UAF issues format-helper: - improve tests gpusvm: - introduce devmem only flag for allocation - add timeslicing support to GPU SVM ttm: - improve eviction sched: - tracing improvements - kunit improvements - memory leak fixes - reset handling improvements color mgmt: - add hardware gamma LUT handling helpers bridge: - add destroy hook - switch to reference counted drm_bridge allocations - tc358767: convert to devm_drm_bridge_alloc - improve CEC handling panel: - switch to reference counter drm_panel allocations - fwnode panel lookup - Huiling hl055fhv028c support - Raspberry Pi 7" 720x1280 support - edp: KDC KD116N3730A05, N160JCE-ELL CMN, N116BCJ-EAK - simple: AUO P238HAN01 - st7701: Winstar wf40eswaa6mnn0 - visionox: rm69299-shift - Renesas R61307, Renesas R69328 support - DJN HX83112B hdmi: - add CEC handling - YUV420 output support xe: - WildCat Lake support - Enable PanthorLake by default - mark BMG as SRIOV capable - update firmware recommendations - Expose media OA units - aux-bux support for non-volatile memory - MTD intel-dg driver for non-volatile memory - Expose fan control and voltage regulator in sysfs - restructure migration for multi-device - Restore GuC submit UAF fix - make GEM shrinker drm managed - SRIOV VF Post-migration recovery of GGTT nodes - W/A additions/reworks - Prefetch support for svm ranges - Don't allocate managed BO for each policy change - HWMON fixes for BMG - Create LRC BO without VM - PCI ID updates - make SLPC debugfs files optional - rework eviction rejection of bound external BOs - consolidate PAT programming logic for pre/post Xe2 - init changes for flicker-free boot - Enable GuC Dynamic Inhibit Context switch i915: - drm_panic support for i915/xe - initial flip queue off by default for LNL/PNL - Wildcat Lake Display support - Support for DSC fractional link bpp - Support for simultaneous Panel Replay and Adaptive sync - Support for PTL+ double buffer LUT - initial PIPEDMC event handling - drm_panel_follower support - DPLL interface renames - allocate struct intel_display dynamically - flip queue preperation - abstract DRAM detection better - avoid GuC scheduling stalls - remove DG1 force probe requirement - fix MEI interrupt handler on RT kernels - use backlight control helpers for eDP - more shared display code refactoring amdgpu: - add userq slot to INFO ioctl - SR-IOV hibernation support - Suspend improvements - Backlight improvements - Use scaling for non-native eDP modes - cleaner shader updates for GC 9.x - Remove fence slab - SDMA fw checks for userq support - RAS updates - DMCUB updates - DP tunneling fixes - Display idle D3 support - Per queue reset improvements - initial smartmux support amdkfd: - enable KFD on loongarch - mtype fix for ext coherent system memory radeon: - CS validation additional GL extensions - drop console lock during suspend/resume - bump driver version msm: - VM BIND support - CI: infrastructure updates - UBWC single source of truth - decouple GPU and KMS support - DP: rework I/O accessors - DPU: SM8750 support - DSI: SM8750 support - GPU: X1-45 support and speedbin support for X1-85 - MDSS: SM8750 support nova: - register! macro improvements - DMA object abstraction - VBIOS parser + fwsec lookup - sysmem flush page support - falcon: generic falcon boot code and HAL - FWSEC-FRTS: fb setup and load/execute ivpu: - Add Wildcat Lake support - Add turbo flag ast: - improve hardware generations implementation imx: - IMX8qxq Display Controller support lima: - Rockchip RK3528 GPU support nouveau: - fence handling cleanup panfrost: - MT8370 support - bo labeling - 64-bit register access qaic: - add RAS support rockchip: - convert inno_hdmi to a bridge rz-du: - add RZ/V2H(P) support - MIPI-DSI DCS support sitronix: - ST7567 support sun4i: - add H616 support tidss: - add TI AM62L support - AM65x OLDI bridge support bochs: - drm panic support vkms: - YUV and R* format support - use faux device vmwgfx: - fence improvements hyperv: - move out of simple - add drm_panic support" * tag 'drm-next-2025-07-30' of https://gitlab.freedesktop.org/drm/kernel: (1479 commits) drm/tidss: oldi: convert to devm_drm_bridge_alloc() API drm/tidss: encoder: convert to devm_drm_bridge_alloc() drm/amdgpu: move reset support type checks into the caller drm/amdgpu/sdma7: re-emit unprocessed state on ring reset drm/amdgpu/sdma6: re-emit unprocessed state on ring reset drm/amdgpu/sdma5.2: re-emit unprocessed state on ring reset drm/amdgpu/sdma5: re-emit unprocessed state on ring reset drm/amdgpu/gfx12: re-emit unprocessed state on ring reset drm/amdgpu/gfx11: re-emit unprocessed state on ring reset drm/amdgpu/gfx10: re-emit unprocessed state on ring reset drm/amdgpu/gfx9.4.3: re-emit unprocessed state on kcq reset drm/amdgpu/gfx9: re-emit unprocessed state on kcq reset drm/amdgpu: Add WARN_ON to the resource clear function drm/amd/pm: Use cached metrics data on SMUv13.0.6 drm/amd/pm: Use cached data for min/max clocks gpu: nova-core: fix bounds check in PmuLookupTableEntry::new drm/amdgpu: Replace HQD terminology with slots naming drm/amdgpu: Add user queue instance count in HW IP info drm/amd/amdgpu: Add helper functions for isp buffers drm/amd/amdgpu: Initialize swnode for ISP MFD device ...
2025-07-29Merge tag 'driver-core-6.17-rc1' of ↵Linus Torvalds2-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core Pull driver core updates from Danilo Krummrich: "debugfs: - Remove unneeded debugfs_file_{get,put}() instances - Remove last remnants of debugfs_real_fops() - Allow storing non-const void * in struct debugfs_inode_info::aux sysfs: - Switch back to attribute_group::bin_attrs (treewide) - Switch back to bin_attribute::read()/write() (treewide) - Constify internal references to 'struct bin_attribute' Support cache-ids for device-tree systems: - Add arch hook arch_compact_of_hwid() - Use arch_compact_of_hwid() to compact MPIDR values on arm64 Rust: - Device: - Introduce CoreInternal device context (for bus internal methods) - Provide generic drvdata accessors for bus devices - Provide Driver::unbind() callbacks - Use the infrastructure above for auxiliary, PCI and platform - Implement Device::as_bound() - Rename Device::as_ref() to Device::from_raw() (treewide) - Implement fwnode and device property abstractions - Implement example usage in the Rust platform sample driver - Devres: - Remove the inner reference count (Arc) and use pin-init instead - Replace Devres::new_foreign_owned() with devres::register() - Require T to be Send in Devres<T> - Initialize the data kept inside a Devres last - Provide an accessor for the Devres associated Device - Device ID: - Add support for ACPI device IDs and driver match tables - Split up generic device ID infrastructure - Use generic device ID infrastructure in net::phy - DMA: - Implement the dma::Device trait - Add DMA mask accessors to dma::Device - Implement dma::Device for PCI and platform devices - Use DMA masks from the DMA sample module - I/O: - Implement abstraction for resource regions (struct resource) - Implement resource-based ioremap() abstractions - Provide platform device accessors for I/O (remap) requests - Misc: - Support fallible PinInit types in Revocable - Implement Wrapper<T> for Opaque<T> - Merge pin-init blanket dependencies (for Devres) Misc: - Fix OF node leak in auxiliary_device_create() - Use util macros in device property iterators - Improve kobject sample code - Add device_link_test() for testing device link flags - Fix typo in Documentation/ABI/testing/sysfs-kernel-address_bits - Hint to prefer container_of_const() over container_of()" * tag 'driver-core-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core: (84 commits) rust: io: fix broken intra-doc links to `platform::Device` rust: io: fix broken intra-doc link to missing `flags` module rust: io: mem: enable IoRequest doc-tests rust: platform: add resource accessors rust: io: mem: add a generic iomem abstraction rust: io: add resource abstraction rust: samples: dma: set DMA mask rust: platform: implement the `dma::Device` trait rust: pci: implement the `dma::Device` trait rust: dma: add DMA addressing capabilities rust: dma: implement `dma::Device` trait rust: net::phy Change module_phy_driver macro to use module_device_table macro rust: net::phy represent DeviceId as transparent wrapper over mdio_device_id rust: device_id: split out index support into a separate trait device: rust: rename Device::as_ref() to Device::from_raw() arm64: cacheinfo: Provide helper to compress MPIDR value into u32 cacheinfo: Add arch hook to compress CPU h/w id into 32 bits for cache-id cacheinfo: Set cache 'id' based on DT data container_of: Document container_of() is not to be used in new code driver core: auxiliary bus: fix OF node leak ...
2025-07-28drm/amdgpu: update mmhub 4.1.0 client id mappingsAlex Deucher1-21/+13
Update the client id mapping so the correct clients get printed when there is a mmhub page fault. Tested-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2025-07-28drm/amdgpu: add kicker fws loading for gfx12/smu14/psp14Frank Min4-7/+21
1. Add kicker firmwares loading for gfx12/smu14/psp14 2. Register additional MODULE_FIRMWARE entries for kicker fws - gc_12_0_1_rlc_kicker.bin - gc_12_0_1_imu_kicker.bin - psp_14_0_3_sos_kicker.bin - psp_14_0_3_ta_kicker.bin - smu_14_0_3_kicker.bin Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Gui Chengming <Jack.Gui@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2025-07-28drm/amd/amdgpu: fix missing lock for cper.ring->rptr/wptr accessYang Wang1-2/+4
Add lock protection for 'ring->wptr'/'ring->rptr' to ensure the correct execution. Fixes: 8652920d2c00 ("drm/amdgpu: add mutex lock for cper ring") Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2025-07-28drm/amdgpu: Avoid extra evict-restore process.Gang Ba1-4/+2
If vm belongs to another process, this is fclose after fork, wait may enable signaling KFD eviction fence and cause parent process queue evicted. [677852.634569] amdkfd_fence_enable_signaling+0x56/0x70 [amdgpu] [677852.634814] __dma_fence_enable_signaling+0x3e/0xe0 [677852.634820] dma_fence_wait_timeout+0x3a/0x140 [677852.634825] amddma_resv_wait_timeout+0x7f/0xf0 [amdkcl] [677852.634831] amdgpu_vm_wait_idle+0x2d/0x60 [amdgpu] [677852.635026] amdgpu_flush+0x34/0x50 [amdgpu] [677852.635208] filp_flush+0x38/0x90 [677852.635213] filp_close+0x14/0x30 [677852.635216] do_close_on_exec+0xdd/0x130 [677852.635221] begin_new_exec+0x1da/0x490 [677852.635225] load_elf_binary+0x307/0xea0 [677852.635231] ? srso_alias_return_thunk+0x5/0xfbef5 [677852.635235] ? ima_bprm_check+0xa2/0xd0 [677852.635240] search_binary_handler+0xda/0x260 [677852.635245] exec_binprm+0x58/0x1a0 [677852.635249] bprm_execve.part.0+0x16f/0x210 [677852.635254] bprm_execve+0x45/0x80 [677852.635257] do_execveat_common.isra.0+0x190/0x200 Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Gang Ba <Gang.Ba@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2025-07-28drm/amdgpu: track whether a queue is a kernel queue in amdgpu_mqd_propAlex Deucher2-0/+2
Used to to set the MQD appropriately for each queue type. Kernel queues have additional privileges. Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.16.x
2025-07-28drm/amdgpu: Initialize data to NULL in imu_v12_0_program_rlc_ram()Nathan Chancellor1-1/+1
After a recent change in clang to expose uninitialized warnings from const variables and pointers [1], there is a warning in imu_v12_0_program_rlc_ram() because data is passed uninitialized to program_imu_rlc_ram(): drivers/gpu/drm/amd/amdgpu/imu_v12_0.c:374:30: error: variable 'data' is uninitialized when used here [-Werror,-Wuninitialized] 374 | program_imu_rlc_ram(adev, data, (const u32)size); | ^~~~ As this warning happens early in clang's frontend, it does not realize that due to the assignment of r to -EINVAL, program_imu_rlc_ram() is never actually called, and even if it were, data would not be dereferenced because size is 0. Just initialize data to NULL to silence the warning, as the commit that added program_imu_rlc_ram() mentioned it would eventually be used over the old method, at which point data can be properly initialized and used. Cc: stable@vger.kernel.org Closes: https://github.com/ClangBuiltLinux/linux/issues/2107 Fixes: 56159fffaab5 ("drm/amdgpu: use new method to program rlc ram") Link: https://github.com/llvm/llvm-project/commit/2464313eef01c5b1edf0eccf57a32cdee01472c7 [1] Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-23Merge tag 'drm-misc-fixes-2025-07-23' of ↵Dave Airlie3-0/+20
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes drm-misc-fixes for v6.16-rc8/final?: - Revert all uses of drm_gem_object->dmabuf to drm_gem_object->import_attach->dmabuf. - Fix amdgpu returning BIOS cluttered VRAM after resume. - Scheduler hang fix. - Revert nouveau ioctl fix as it caused regressions. - Fix null pointer deref in nouveau. - Fix unnecessary semicolon in ti_sn_bridge_probe. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://lore.kernel.org/r/72235afd-c849-49fe-9cc1-2b1781abdf08@linux.intel.com
2025-07-21Merge tag 'amd-drm-next-6.17-2025-07-17' of ↵Dave Airlie52-374/+910
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.17-2025-07-17: amdgpu: - Partition fixes - Reset fixes - RAS fixes - i2c fix - MPC updates - DSC cleanup - EDID fixes - Display idle D3 update - IPS updates - DMUB updates - Retimer fix - Replay fixes - Fix DC memory leak - Initial support for smartmux - DCN 4.0.1 degamma LUT fix - Per queue reset cleanups - Track ring state associated with a fence - SR-IOV fixes - SMU fixes - Per queue reset improvements for GC 9+ compute - Per queue reset improvements for GC 10+ gfx - Per queue reset improvements for SDMA 5+ - Per queue reset improvements for JPEG 2+ - Per queue reset improvements for VCN 2+ - GC 8 fix - ISP updates amdkfd: - Enable KFD on LoongArch radeon: - Drop console lock during suspend/resume UAPI: - Add userq slot info to INFO IOCTL Used for IGT userq validation tests (https://lists.freedesktop.org/archives/igt-dev/2025-July/093228.html) From: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250717213827.2061581-1-alexander.deucher@amd.com Signed-off-by: Dave Airlie <airlied@redhat.com>
2025-07-21Merge tag 'drm-misc-next-2025-07-17' of ↵Dave Airlie6-9/+38
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for 6.17: UAPI Changes: Cross-subsystem Changes: Core Changes: - mode_config: Change fb_create prototype to pass the drm_format_info and avoid redundant lookups in drivers - sched: kunit improvements, memory leak fixes, reset handling improvements - tests: kunit EDID update Driver Changes: - amdgpu: Hibernation fixes, structure lifetime fixes - nouveau: sched improvements - sitronix: Add Sitronix ST7567 Support - bridge: - Make connector available to bridge detect hook - panel: - More refcounting changes - New panels: BOE NE14QDM Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <mripard@redhat.com> Link: https://lore.kernel.org/r/20250717-efficient-kudu-of-fantasy-ff95e0@houat
2025-07-17drm/amdgpu: move reset support type checks into the callerAlex Deucher25-79/+37
Rather than checking in the callbacks, check if the reset type is supported in the caller. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-17drm/amdgpu/sdma7: re-emit unprocessed state on ring resetAlex Deucher1-4/+3
Re-emit the unprocessed state after resetting the queue. Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-17drm/amdgpu/sdma6: re-emit unprocessed state on ring resetAlex Deucher1-4/+3
Re-emit the unprocessed state after resetting the queue. Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-17drm/amdgpu/sdma5.2: re-emit unprocessed state on ring resetAlex Deucher1-2/+6
Re-emit the unprocessed state after resetting the queue. Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-17drm/amdgpu/sdma5: re-emit unprocessed state on ring resetAlex Deucher1-2/+6
Re-emit the unprocessed state after resetting the queue. Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-17drm/amdgpu/gfx12: re-emit unprocessed state on ring resetAlex Deucher1-31/+4
Re-emit the unprocessed state after resetting the queue. Drop the soft_recovery callbacks as the queue reset replaces it. Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-17drm/amdgpu/gfx11: re-emit unprocessed state on ring resetAlex Deucher1-31/+4
Re-emit the unprocessed state after resetting the queue. Drop the soft_recovery callbacks as the queue reset replaces it. Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-17drm/amdgpu/gfx10: re-emit unprocessed state on ring resetAlex Deucher1-31/+4
Re-emit the unprocessed state after resetting the queue. Drop the soft_recovery callbacks as the queue reset replaces it. Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-17drm/amdgpu/gfx9.4.3: re-emit unprocessed state on kcq resetAlex Deucher1-7/+2
Re-emit the unprocessed state after resetting the queue. Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-17drm/amdgpu/gfx9: re-emit unprocessed state on kcq resetAlex Deucher1-8/+2
Re-emit the unprocessed state after resetting the queue. Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-17drm/amdgpu: Add WARN_ON to the resource clear functionArunpravin Paneer Selvam1-1/+4
Set the dirty bit when the memory resource is not cleared during BO release. v2(Christian): - Drop the cleared flag set to false. - Improve the amdgpu_vram_mgr_set_clear_state() function. v3: - Add back the resource clear flag set function call after being cleared during eviction (Christian). - Modified the patch subject name. Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-16drm/amdgpu/gfx8: reset compute ring wptr on the GPU on resumeEeli Haapalainen1-0/+1
Commit 42cdf6f687da ("drm/amdgpu/gfx8: always restore kcq MQDs") made the ring pointer always to be reset on resume from suspend. This caused compute rings to fail since the reset was done without also resetting it for the firmware. Reset wptr on the GPU to avoid a disconnect between the driver and firmware wptr. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3911 Fixes: 42cdf6f687da ("drm/amdgpu/gfx8: always restore kcq MQDs") Signed-off-by: Eeli Haapalainen <eeli.haapalainen@protonmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 2becafc319db3d96205320f31cc0de4ee5a93747) Cc: stable@vger.kernel.org
2025-07-16drm/amdgpu: Increase reset counter only on successLijo Lazar1-2/+7
Increment the reset counter only if soft recovery succeeded. This is consistent with a ring hard reset behaviour where counter gets incremented only if hard reset succeeded. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 25c314aa3ec3d30e4ee282540e2096b5c66a2437) Cc: stable@vger.kernel.org
2025-07-16drm/amdgpu: Replace HQD terminology with slots namingJesse Zhang1-5/+5
The term "HQD" is CP-specific and doesn't accurately describe the queue resources for other IP blocks like SDMA, VCN, or VPE. This change: 1. Renames `num_hqds` to `num_slots` in amdgpu_kms.c to better reflect the generic nature of the resource counting 2. Updates the UAPI struct member from `userq_num_hqds` to `userq_num_slots` 3. Maintains the same functionality while using more appropriate terminology Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-16drm/amdgpu: Add user queue instance count in HW IP infoJesse Zhang1-0/+20
This change exposes the number of available user queue instances for each hardware IP type (GFX, COMPUTE, SDMA) through the drm_amdgpu_info_hw_ip interface. Key changes: 1. Added userq_num_instance field to drm_amdgpu_info_hw_ip structure 2. Implemented counting of available HQD slots using: - mes.gfx_hqd_mask for GFX queues - mes.compute_hqd_mask for COMPUTE queues - mes.sdma_hqd_mask for SDMA queues 3. Only counts available instances when user queues are enabled (!disable_uq) v2: using the adev->mes.gfx_hqd_mask[]/compute_hqd_mask[]/sdma_hqd_mask[] masks to determine the number of queue slots available for each engine type (Alex) v3: rename userq_num_instance to userq_num_hqds (Alex) Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-16drm/amd/amdgpu: Add helper functions for isp buffersPratap Nirujogi3-10/+176
Accessing amdgpu internal data structures "struct amdgpu_device" and "struct amdgpu_bo" in ISP V4L2 driver to alloc/free GART buffers is not recommended. Add new amdgpu_isp helper functions that takes opaque params from ISP V4L2 driver and calls the amdgpu internal functions amdgpu_bo_create_isp_user() and amdgpu_bo_create_kernel() to alloc/free GART buffers. Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-16drm/amd/amdgpu: Initialize swnode for ISP MFD devicePratap Nirujogi3-6/+11
Create amd_isp_capture MFD device with swnode initialized to isp specific software_node part of fwnode graph in amd_isp4 x86/platform driver. The isp driver use this swnode handle to retrieve the critical properties (data-lanes, mipi phyid, link-frequencies etc.) required for camera to work on AMD ISP4 based targets. Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-16drm/amdgpu/gfx8: reset compute ring wptr on the GPU on resumeEeli Haapalainen1-0/+1
Commit 42cdf6f687da ("drm/amdgpu/gfx8: always restore kcq MQDs") made the ring pointer always to be reset on resume from suspend. This caused compute rings to fail since the reset was done without also resetting it for the firmware. Reset wptr on the GPU to avoid a disconnect between the driver and firmware wptr. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3911 Fixes: 42cdf6f687da ("drm/amdgpu/gfx8: always restore kcq MQDs") Signed-off-by: Eeli Haapalainen <eeli.haapalainen@protonmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-16drm/amdgpu/jpeg: clean up reset type handlingAlex Deucher8-26/+29
Make the handling consistent with other IPs and across JPEG versions. Reviewed-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-16drm/amdgpu: rework gmc_v9_0_get_coherence_flags v2Christian König1-4/+4
Avoid using the mapping here. v2: use amdgpu_xgmi_same_hive() as suggested by Felix Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-16drm/amdgpu/vcn3: implement ring resetAlex Deucher1-0/+27
Use the new helpers to handle engine resets for VCN. Reviewed-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Tested-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-16drm/amdgpu/vcn2.5: implement ring resetAlex Deucher1-0/+25
Use the new helpers to handle engine resets for VCN. Reviewed-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Tested-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-16drm/amdgpu/vcn2: implement ring resetAlex Deucher1-0/+26
Use the new helpers to handle engine resets for VCN. Reviewed-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Tested-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>