summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm
AgeCommit message (Collapse)AuthorFilesLines
2025-11-26drm/panthor: Reset queue slots if termination failsAshley Smith1-2/+12
Make sure the queue slot is reset even if we failed termination so we don't have garbage in the CS input interface after a reset. In practice that's not a problem because we zero out all RW sections when a hangs occurs, but it's safer to reset things manually, in case we decide to not conditionally reload RW sections based on the type of hang. v4: - Split the changes in two separate patches v5: - No changes v6: - Adjust the explanation in the commit message - Drop the Fixes tag - Put after the timeout changes and make the two patches independent so one can be backported, and the other not v7: - Use the local group variable instead of dereferencing csg_slot->group - Add Steve's R-b v8: - No changes Signed-off-by: Ashley Smith <ashley.smith@collabora.com> Reviewed-by: Steven Price <steven.price@arm.com> Link: https://patch.msgid.link/20251113105734.1520338-3-boris.brezillon@collabora.com Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
2025-11-26drm/panthor: Make the timeout per-queue instead of per-jobAshley Smith1-78/+199
The timeout logic provided by drm_sched leads to races when we try to suspend it while the drm_sched workqueue queues more jobs. Let's overhaul the timeout handling in panthor to have our own delayed work that's resumed/suspended when a group is resumed/suspended. When an actual timeout occurs, we call drm_sched_fault() to report it through drm_sched, still. But otherwise, the drm_sched timeout is disabled (set to MAX_SCHEDULE_TIMEOUT), which leaves us in control of how we protect modifications on the timer. One issue seems to be when we call drm_sched_suspend_timeout() from both queue_run_job() and tick_work() which could lead to races due to drm_sched_suspend_timeout() not having a lock. Another issue seems to be in queue_run_job() if the group is not scheduled, we suspend the timeout again which undoes what drm_sched_job_begin() did when calling drm_sched_start_timeout(). So the timeout does not reset when a job is finished. v2: - Fix syntax error v3: - Split the changes in two commits v4: - No changes v5: - No changes v6: - Fix a NULL deref in group_can_run(), and narrow the group variable scope to avoid such mistakes in the future - Add an queue_timeout_is_suspended() helper to clarify things v7: - No changes v8: - Don't touch drm_gpu_scheduler::timeout in queue_timedout_job() Fixes: de8548813824 ("drm/panthor: Add the scheduler logical block") Reviewed-by: Steven Price <steven.price@arm.com> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Reviewed-by: Adrián Larumbe <adrian.larumbe@collabora.com> Signed-off-by: Ashley Smith <ashley.smith@collabora.com> Co-developed-by: Boris Brezillon <boris.brezillon@collabora.com> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Link: https://patch.msgid.link/20251113105734.1520338-2-boris.brezillon@collabora.com
2025-11-26drm/gem: Correct error condition in drm_gem_objects_lookupSteven Price1-5/+2
When vmemdup_array_user() fails, 'handles' is set to a negative error code and no memory is allocated. So the call to kvfree() should not happen. Instead just return early with the error code. Fixes: cb77b79abf5f ("drm/gem: Use vmemdup_array_user in drm_gem_objects_lookup") Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Link: https://patch.msgid.link/20251124112039.117748-1-steven.price@arm.com
2025-11-26drm/xe: Add caching pagetable flagZbigniew Kempczyński4-2/+10
Introduce device xe_caching_pt flag to selectively turn it on for supported platforms. It allows to eliminate version check and enable this feature for the future platforms. Signed-off-by: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patch.msgid.link/20251125153732.400766-2-zbigniew.kempczynski@intel.com
2025-11-26drm/xe/vm: Skip ufence association for CPU address mirror VMA during MAPHimal Prasad Ghimiray1-1/+2
The MAP operation for a CPU address mirror VMA does not require ufence association because such mappings are not GPU-synchronized and do not participate in GPU job completion signaling. Remove the unnecessary ufence addition for this case to avoid -EBUSY failure in check_ufence of unbind ops. Cc: Matthew Brost <matthew.brost@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20251125075628.1182481-6-himal.prasad.ghimiray@intel.com Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
2025-11-26drm/xe/svm: Enable UNMAP for VMA merging operationsHimal Prasad Ghimiray3-6/+8
ALLOW UNMAP of VMAs associated with SVM mappings when the MAP operation is intended to merge adjacent CPU_ADDR_MIRROR VMAs. v2 - Remove mapping exist check in garbage collector Cc: Matthew Brost <matthew.brost@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20251125075628.1182481-5-himal.prasad.ghimiray@intel.com Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
2025-11-26drm/xe/svm: Extend MAP range to reduce vma fragmentationHimal Prasad Ghimiray1-2/+7
When DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR is set during VM_BIND_OP_MAP, the mapping logic now checks adjacent cpu_addr_mirror VMAs with default attributes and expands the mapping range accordingly. This ensures that bo_unmap operations ideally target the same area and helps reduce fragmentation by coalescing nearby compatible VMAs into a single mapping. Cc: Matthew Brost <matthew.brost@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20251125075628.1182481-4-himal.prasad.ghimiray@intel.com Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
2025-11-26drm/xe: Merge adjacent default-attribute VMAs during garbage collectionHimal Prasad Ghimiray1-26/+36
While restoring default memory attributes for VMAs during garbage collection, extend the target range by checking neighboring VMAs. If adjacent VMAs are CPU-address-mirrored and have default attributes, include them in the mergeable range to reduce fragmentation and improve VMA reuse. v2 -Rebase Cc: Matthew Brost <matthew.brost@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20251125075628.1182481-3-himal.prasad.ghimiray@intel.com Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
2025-11-26drm/xe: Add helper to extend CPU-mirrored VMA range for mergeHimal Prasad Ghimiray2-0/+44
Introduce xe_vm_find_cpu_addr_mirror_vma_range(), which computes an extended range around a given range by including adjacent VMAs that are CPU-address-mirrored and have default memory attributes. This helper is useful for determining mergeable range without performing the actual merge. v2 - Add assert - Move unmap check to this patch v3 - Decrease offset to check by SZ_4K to avoid wrong vma return in fast lookup path v4 - *start should be >= SZ_4K (Matt) Cc: Matthew Brost <matthew.brost@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20251125075628.1182481-2-himal.prasad.ghimiray@intel.com Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
2025-11-26drm/panthor: Improve IOMMU map/unmap debugging logsLoïc Molinari1-5/+14
Log the number of pages and their sizes actually mapped/unmapped by the IOMMU page table driver. Since a map/unmap op is often split in several ops depending on the underlying scatter/gather table, add the start address and the total size to the debugging logs in order to help understand which batch an op is part of. Signed-off-by: Loïc Molinari <loic.molinari@collabora.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Link: https://patch.msgid.link/20251114170303.2800-10-loic.molinari@collabora.com Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
2025-11-26drm/panthor: Add support for Mali-G1 GPUsKarunika Choo2-4/+32
Add support for Mali-G1 GPUs (CSF architecture v14), introducing a new panthor_hw_arch_v14 entry with reset and L2 power management operations via the PWR_CONTROL block. Mali-G1 introduces a dedicated PWR_CONTROL block for managing resets and power domains. panthor_gpu_info_init() is updated to use this block for L2, tiler, and shader domain present register reads. Reviewed-by: Steven Price <steven.price@arm.com> Signed-off-by: Karunika Choo <karunika.choo@arm.com> Link: https://patch.msgid.link/20251125125548.3282320-9-karunika.choo@arm.com Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
2025-11-26drm/panthor: Support 64-bit endpoint_req register for Mali-G1Karunika Choo3-10/+72
Add support for the 64-bit endpoint_req register introduced in CSF v4.0+ GPUs. Unlike a simple register widening, the 64-bit variant occupies the next 64 bits after the original 32-bit field, requiring version-dependent access. This change introduces helper functions to read, write, and update the endpoint_req register, ensuring correct handling on both pre-v4.0 and v4.0+ firmwares. Reviewed-by: Steven Price <steven.price@arm.com> Signed-off-by: Karunika Choo <karunika.choo@arm.com> Link: https://patch.msgid.link/20251125125548.3282320-8-karunika.choo@arm.com Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
2025-11-26drm/panthor: Support GLB_REQ.STATE field for Mali-G1 GPUsKarunika Choo2-16/+80
Add support for the GLB_REQ.STATE field introduced in CSF v4.1+, which replaces the HALT bit to provide finer control over the MCU state. This change implements basic handling for transitioning the MCU between ACTIVE and HALT states on Mali-G1 GPUs. The update introduces new helpers to issue the state change requests, poll for MCU halt completion, and restore the MCU to an active state after halting. Reviewed-by: Steven Price <steven.price@arm.com> Signed-off-by: Karunika Choo <karunika.choo@arm.com> Link: https://patch.msgid.link/20251125125548.3282320-7-karunika.choo@arm.com Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
2025-11-26drm/panthor: Implement soft reset via PWR_CONTROLKarunika Choo2-0/+52
Add helpers to issue reset commands through the PWR_CONTROL interface and wait for reset completion using IRQ signaling. This enables support for RESET_SOFT operations with timeout handling and status verification. Reviewed-by: Steven Price <steven.price@arm.com> Signed-off-by: Karunika Choo <karunika.choo@arm.com> Link: https://patch.msgid.link/20251125125548.3282320-6-karunika.choo@arm.com Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
2025-11-26drm/panthor: Implement L2 power on/off via PWR_CONTROLKarunika Choo3-0/+383
This patch adds common helpers to issue power commands, poll transitions, and validate domain state, then wires them into the L2 on/off paths. The L2 power-on sequence now delegates control of the SHADER and TILER domains to the MCU when allowed, while the L2 itself is never delegated. On power-off, dependent domains beneath the L2 are checked, and if necessary, retracted and powered down to maintain proper domain ordering. Reviewed-by: Steven Price <steven.price@arm.com> Signed-off-by: Karunika Choo <karunika.choo@arm.com> Link: https://patch.msgid.link/20251125125548.3282320-5-karunika.choo@arm.com Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
2025-11-26drm/panthor: Introduce panthor_pwr API and power control frameworkKarunika Choo7-1/+240
Add the new panthor_pwr module, which provides basic power control management for Mali-G1 GPUs. The initial implementation includes infrastructure for initializing the PWR_CONTROL block, requesting and handling its IRQ, and checking for PWR_CONTROL support based on GPU architecture. The patch also integrates panthor_pwr with the device lifecycle (init, suspend, resume, and unplug) through the new API functions. It also registers the IRQ handler under the 'gpu' IRQ as the PWR_CONTROL block is located within the GPU_CONTROL block. Reviewed-by: Steven Price <steven.price@arm.com> Signed-off-by: Karunika Choo <karunika.choo@arm.com> Link: https://patch.msgid.link/20251125125548.3282320-4-karunika.choo@arm.com Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
2025-11-26drm/panthor: Add architecture-specific function operationsKarunika Choo6-9/+57
Introduce architecture-specific function pointers to support architecture-dependent behaviours. This patch adds the following function pointers and updates their usage accordingly: - soft_reset - l2_power_on - l2_power_off Reviewed-by: Steven Price <steven.price@arm.com> Signed-off-by: Karunika Choo <karunika.choo@arm.com> Link: https://patch.msgid.link/20251125125548.3282320-3-karunika.choo@arm.com Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
2025-11-26drm/panthor: Add arch-specific panthor_hw bindingKarunika Choo3-1/+74
This patch adds the framework for binding to a specific panthor_hw structure based on the architecture major value parsed from the GPU_ID register. This is in preparation of enabling architecture-specific behaviours based on GPU_ID. As such, it also splits the GPU_ID register read operation into its own helper function. This framework allows a single panthor_hw structure to be shared across multiple architectures should there be minimal changes between them via the arch_min and arch_max field of the panthor_hw_entry structure, instead of duplicating the structure across multiple architectures. Reviewed-by: Steven Price <steven.price@arm.com> Signed-off-by: Karunika Choo <karunika.choo@arm.com> Link: https://patch.msgid.link/20251125125548.3282320-2-karunika.choo@arm.com Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
2025-11-26drm/panthor: Avoid adding of kernel BOs to extobj listAkash Goel1-3/+3
The kernel BOs unnecessarily got added to the external objects list of drm_gpuvm, when mapping to GPU, which would have resulted in few extra CPU cycles being spent at the time of job submission as drm_exec_until_all_locked() loop iterates over all external objects. Kernel BOs are private to a VM and so they share the dma_resv object of the dummy GEM object created for a VM. Use of DRM_EXEC_IGNORE_DUPLICATES flag ensured the recursive locking of the dummy GEM object was ignored. Also no extra space got allocated to add fences to the dma_resv object of dummy GEM object. So no other impact apart from few extra CPU cycles. This commit sets the pointer to dma_resv object of GEM object of kernel BOs before they are mapped to GPU, to prevent them from being added to external objects list. v2: Add R-bs and fixes tags Fixes: 8a1cc07578bf ("drm/panthor: Add GEM logical block") Signed-off-by: Akash Goel <akash.goel@arm.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Steven Price <steven.price@arm.com> Link: https://patch.msgid.link/20251120172118.2741724-1-akash.goel@arm.com Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
2025-11-25drm/xe: Fix conversion from clock ticks to millisecondsHarish Chegondi1-6/+1
When tick counts are large and multiplication by MSEC_PER_SEC is larger than 64 bits, the conversion from clock ticks to milliseconds can go bad. Use mul_u64_u32_div() instead. Cc: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Suggested-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Fixes: 49cc215aad7f ("drm/xe: Add xe_gt_clock_interval_to_ms helper") Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patch.msgid.link/1562f1b62d5be3fbaee100f09107f3cc49e40dd1.1763408584.git.harish.chegondi@intel.com (cherry picked from commit 96b93ac214f9dd66294d975d86c5dee256faef91) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-11-25drm/xe/guc: Fix stack_depot usageLucas De Marchi1-0/+3
Add missing stack_depot_init() call when CONFIG_DRM_XE_DEBUG_GUC is enabled to fix the following call stack: [] BUG: kernel NULL pointer dereference, address: 0000000000000000 [] Workqueue: drm_sched_run_job_work [gpu_sched] [] RIP: 0010:stack_depot_save_flags+0x172/0x870 [] Call Trace: [] <TASK> [] fast_req_track+0x58/0xb0 [xe] Fixes: 16b7e65d299d ("drm/xe/guc: Track FAST_REQ H2Gs to report where errors came from") Tested-by: Sagar Ghuge <sagar.ghuge@intel.com> Cc: stable@vger.kernel.org # v6.17+ Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://patch.msgid.link/20251118-fix-debug-guc-v1-1-9f780c6bedf8@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit 64fdf496a6929a0a194387d2bb5efaf5da2b542f) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-11-25drm/xe/guc: Fix resource leak in xe_guc_ct_init_noalloc()Shuicheng Lin1-6/+6
xe_guc_ct_init_noalloc() allocates the CT workqueue and other helpers before it tries to initialize ct->lock. If drmm_mutex_init() fails we currently bail out without releasing those resources because the guc_ct_fini() hasn’t been registered yet. Since destroy_workqueue() in guc_ct_fini() may flush the workqueue, which in turn can take the ct lock, the initialization sequence is restructured to first initialize the ct->lock, then set up all CT state, and finally register guc_ct_fini(). v2: guc_ct_fini() does take ct lock. (Matt) v3: move primelockdep() together with drmm_mutex_init(). (Lucas) Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patch.msgid.link/20251110184522.1581001-2-shuicheng.lin@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit 2e4ad5b0667244f496783c58de0995b9562d3344) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-11-25drm/xe: Protect against unset LRC when pausing submissionsTomasz Lis1-6/+16
While pausing submissions, it is possible to encouner an exec queue which is during creation, and therefore doesn't have a valid xe_lrc struct reference. Protect agains such situation, by checking for NULL before access. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Fixes: c25c1010df88 ("drm/xe/vf: Replay GuC submission state on pause / unpause") Signed-off-by: Tomasz Lis <tomasz.lis@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20251124222853.1900800-1-tomasz.lis@intel.com
2025-11-25drm/xe/vf: Start re-emission from first unsignaled job during VF migrationMatthew Brost3-15/+19
The LRC software ring tail is reset to the first unsignaled pending job's head. Fix the re-emission logic to begin submitting from the first unsignaled job detected, rather than scanning all pending jobs, which can cause imbalance. v2: - Include missing local changes v3: - s/skip_replay/restore_replay (Tomasz) Fixes: c25c1010df88 ("drm/xe/vf: Replay GuC submission state on pause / unpause") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tomasz Lis <tomasz.lis@intel.com> Link: https://patch.msgid.link/20251121152750.240557-1-matthew.brost@intel.com
2025-11-25drm/xe/pf: Handle MERT catastrophic errorsLukasz Laguna2-0/+16
The MERT block triggers an interrupt when a catastrophic error occurs. Update the interrupt handler to read the MERT catastrophic error type and log appropriate debug message. Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/20251124190237.20503-5-lukasz.laguna@intel.com
2025-11-25drm/xe/pf: Add TLB invalidation support for MERTLukasz Laguna9-2/+131
Add support for triggering and handling MERT TLB invalidation. After LMTT updates, the MERT TLB invalidation is initiated to ensure memory translations remain coherent. Completion of the invalidation is signaled via MERT interrupt (bit 13 in the GFX master interrupt register). Detect and handle this interrupt to properly synchronize the invalidation flow. Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/20251124190237.20503-4-lukasz.laguna@intel.com
2025-11-25drm/xe/pf: Configure LMTT in MERTLukasz Laguna2-1/+22
On platforms with standalone MERT, the PF driver needs to program LMTT in MERT's LMEM_CFG register. Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/20251124190237.20503-3-lukasz.laguna@intel.com
2025-11-25drm/xe: Add device flag to indicate standalone MERTLukasz Laguna4-0/+10
The MERT subsystem manages memory accesses between host and device. On the Crescent Island platform, it requires direct management by the driver. Introduce a device flag and corresponding helpers to identify platforms with standalone MERT, enabling proper initialization and handling. Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/20251124190237.20503-2-lukasz.laguna@intel.com
2025-11-25drm/i915: use struct drm_device for clock gating funcsJani Nikula7-17/+18
While we want to refactor intel_clock_gating.[ch] and likely move a lot of display related code to display, start off with a little intermediate change to use struct drm_device in the interface instead of struct drm_i915_private, to allow us to drop another dependency on i915_drv.h and struct drm_i915_private. Cc: Luca Coelho <luciano.coelho@intel.com> Reviewed-by: Mika Kahola <mika.kahola@intel.com> Link: https://patch.msgid.link/20251121112200.3435099-2-jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2025-11-25drm/i915/cdclk: drop i915_drv.h includeJani Nikula1-1/+1
intel_cdclk.c no longer needs i915_drv.h. Drop it. Reviewed-by: Mika Kahola <mika.kahola@intel.com> Link: https://patch.msgid.link/20251121112200.3435099-1-jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2025-11-25drm/i915/psr: Reject async flips when selective fetch is enabledVille Syrjälä2-6/+8
The selective fetch code doesn't handle asycn flips correctly. There is a nonsense check for async flips in intel_psr2_sel_fetch_config_valid() but that only gets called for modesets/fastsets and thus does nothing for async flips. Currently intel_async_flip_check_hw() is very unhappy as the selective fetch code pulls in planes that are not even async flips capable. Reject async flips when selective fetch is enabled, until someone fixes this properly (ie. disable selective fetch while async flips are being issued). Cc: stable@vger.kernel.org Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patch.msgid.link/20251105171015.22234-1-ville.syrjala@linux.intel.com Reviewed-by: Jouni Högander <jouni.hogander@intel.com> (cherry picked from commit a5f0cc8e0cd4007370af6985cb152001310cf20c) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-11-25drm/fb-helper: Allocate and release fb_info in single placeThomas Zimmermann12-131/+26
Move the calls to drm_fb_helper_alloc_info() from drivers into a single place in fbdev helpers. Allocates struct fb_info for a new framebuffer device. Then call drm_fb_helper_single_fb_probe() to create an fbdev screen buffer. Also release the instance on errors by calling drm_fb_helper_release_info(). Simplifies the code and fixes the error cleanup for some of the drivers. Regular release of the struct fb_info instance still happens in drm_fb_helper_fini() as before. v2: - remove error rollback in driver implementations (kernel test robot) - initialize info in TTM implementation (kernel test robot) Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Acked-by: Christian König <christian.koenig@amd.com> # radeon Acked-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> # msm Acked-by: Javier Martinez Canillas <javierm@redhat.com> Link: https://patch.msgid.link/20251027081245.80262-1-tzimmermann@suse.de
2025-11-25drm, fbcon, vga_switcheroo: Avoid race condition in fbcon setupThomas Zimmermann1-14/+0
Protect vga_switcheroo_client_fb_set() with console lock. Avoids OOB access in fbcon_remap_all(). Without holding the console lock the call races with switching outputs. VGA switcheroo calls fbcon_remap_all() when switching clients. The fbcon function uses struct fb_info.node, which is set by register_framebuffer(). As the fb-helper code currently sets up VGA switcheroo before registering the framebuffer, the value of node is -1 and therefore not a legal value. For example, fbcon uses the value within set_con2fb_map() [1] as an index into an array. Moving vga_switcheroo_client_fb_set() after register_framebuffer() can result in VGA switching that does not switch fbcon correctly. Therefore move vga_switcheroo_client_fb_set() under fbcon_fb_registered(), which already holds the console lock. Fbdev calls fbcon_fb_registered() from within register_framebuffer(). Serializes the helper with VGA switcheroo's call to fbcon_remap_all(). Although vga_switcheroo_client_fb_set() takes an instance of struct fb_info as parameter, it really only needs the contained fbcon state. Moving the call to fbcon initialization is therefore cleaner than before. Only amdgpu, i915, nouveau and radeon support vga_switcheroo. For all other drivers, this change does nothing. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://elixir.bootlin.com/linux/v6.17/source/drivers/video/fbdev/core/fbcon.c#L2942 # [1] Fixes: 6a9ee8af344e ("vga_switcheroo: initial implementation (v15)") Acked-by: Javier Martinez Canillas <javierm@redhat.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Cc: dri-devel@lists.freedesktop.org Cc: nouveau@lists.freedesktop.org Cc: amd-gfx@lists.freedesktop.org Cc: linux-fbdev@vger.kernel.org Cc: <stable@vger.kernel.org> # v2.6.34+ Link: https://patch.msgid.link/20251105161549.98836-1-tzimmermann@suse.de
2025-11-25drm/client: log: Implement struct drm_client_funcs.restoreThomas Zimmermann1-0/+13
Restore the log client's output when the DRM core invokes the restore callback. Follow the existing behavior of fbdev emulation wrt. the value of the force parameter. If force is false, acquire the DRM master lock and reprogram the display. This is the case when the user-space compositor exits and the DRM core transfers the display back to the in-kernel client. This also enables drm_log output during reboot and shutdown. If force is true, reprogram without considering the master lock. This overrides the current compositor and prints the log to the screen. In case of system malfunction, users can enter SysRq+v to invoke the emergency error reporting. See Documentation/admin-guide/sysrq.rst for more information. v2: - s/exists/exits/ in second paragraph of commit description - fix grammar in commit description Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com> Link: https://patch.msgid.link/20251110154616.539328-4-tzimmermann@suse.de
2025-11-25drm/client: Support emergency restore via sysrq for all clientsThomas Zimmermann6-45/+83
Move the sysrq functionality from DRM fbdev helpers to the DRM device and in-kernel clients, so that it becomes available on all clients. DRM fbdev helpers support emergency restoration of the console output via a special key combination. Press SysRq+v to replace the current compositor with the kernel's output on the framebuffer console. This allows users to see the log messages during system emergencies. By moving the functionality from fbdev helpers to the DRM device, any in-kernel client can serve as emergency output. This can be used to bring up drm_log, for example. Each DRM device registers itself to the list of possible sysrq handlers. On receiving SysRq+v, the DRM core goes over all registered devices and restores an in-kernel DRM client for each of them. See Documentation/admin-guide/sysrq.rst on how to invoke SysRq. Switch VTs to bring back the user-space compositor. v2: - declare placeholders as 'static inline' (kernel test robot) - fix grammar in commit description Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com> Link: https://patch.msgid.link/20251110154616.539328-3-tzimmermann@suse.de
2025-11-25drm/client: Pass force parameter to client restoreThomas Zimmermann4-23/+13
Add force parameter to client restore and pass value through the layers. The only currently used value is false. If force is true, the client should restore its display even if it does not hold the DRM master lock. This is be required for emergency output, such as sysrq. While at it, inline drm_fb_helper_lastclose(), which is a trivial wrapper around drm_fb_helper_restore_fbdev_mode_unlocked(). Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com> Link: https://patch.msgid.link/20251110154616.539328-2-tzimmermann@suse.de
2025-11-25gpu/drm/nouveau: enable THP support for GPU memory migrationBalbir Singh3-84/+231
Enable MIGRATE_VMA_SELECT_COMPOUND support in nouveau driver to take advantage of THP zone device migration capabilities. Update migration and eviction code paths to handle compound page sizes appropriately, improving memory bandwidth utilization and reducing migration overhead for large GPU memory allocations. [balbirs@nvidia.com: fix sparse error] Link: https://lkml.kernel.org/r/20251115003333.3516870-1-balbirs@nvidia.com Link: https://lkml.kernel.org/r/20251001065707.920170-17-balbirs@nvidia.com Signed-off-by: Balbir Singh <balbirs@nvidia.com> Cc: David Hildenbrand <david@redhat.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Byungchul Park <byungchul@sk.com> Cc: Gregory Price <gourry@gourry.net> Cc: Ying Huang <ying.huang@linux.alibaba.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Nico Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Barry Song <baohua@kernel.org> Cc: Lyude Paul <lyude@redhat.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: David Airlie <airlied@gmail.com> Cc: Simona Vetter <simona@ffwll.ch> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Mika Penttilä <mpenttil@redhat.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Francois Dugast <francois.dugast@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-25mm/zone_device: rename page_free callback to folio_freeBalbir Singh3-9/+11
Change page_free to folio_free to make the folio support for zone device-private more consistent. The PCI P2PDMA callback has also been updated and changed to folio_free() as a result. For drivers that do not support folios (yet), the folio is converted back into page via &folio->page and the page is used as is, in the current callback implementation. Link: https://lkml.kernel.org/r/20251001065707.920170-3-balbirs@nvidia.com Signed-off-by: Balbir Singh <balbirs@nvidia.com> Cc: David Hildenbrand <david@redhat.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Byungchul Park <byungchul@sk.com> Cc: Gregory Price <gourry@gourry.net> Cc: Ying Huang <ying.huang@linux.alibaba.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Nico Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Barry Song <baohua@kernel.org> Cc: Lyude Paul <lyude@redhat.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: David Airlie <airlied@gmail.com> Cc: Simona Vetter <simona@ffwll.ch> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Mika Penttilä <mpenttil@redhat.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Francois Dugast <francois.dugast@intel.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-25mm/zone_device: support large zone device private foliosBalbir Singh3-3/+3
Patch series "mm: support device-private THP", v7. This patch series introduces support for Transparent Huge Page (THP) migration in zone device-private memory. The implementation enables efficient migration of large folios between system memory and device-private memory Background Current zone device-private memory implementation only supports PAGE_SIZE granularity, leading to: - Increased TLB pressure - Inefficient migration between CPU and device memory This series extends the existing zone device-private infrastructure to support THP, leading to: - Reduced page table overhead - Improved memory bandwidth utilization - Seamless fallback to base pages when needed In my local testing (using lib/test_hmm) and a throughput test, the series shows a 350% improvement in data transfer throughput and a 80% improvement in latency These patches build on the earlier posts by Ralph Campbell [1] Two new flags are added in vma_migration to select and mark compound pages. migrate_vma_setup(), migrate_vma_pages() and migrate_vma_finalize() support migration of these pages when MIGRATE_VMA_SELECT_COMPOUND is passed in as arguments. The series also adds zone device awareness to (m)THP pages along with fault handling of large zone device private pages. page vma walk and the rmap code is also zone device aware. Support has also been added for folios that might need to be split in the middle of migration (when the src and dst do not agree on MIGRATE_PFN_COMPOUND), that occurs when src side of the migration can migrate large pages, but the destination has not been able to allocate large pages. The code supported and used folio_split() when migrating THP pages, this is used when MIGRATE_VMA_SELECT_COMPOUND is not passed as an argument to migrate_vma_setup(). The test infrastructure lib/test_hmm.c has been enhanced to support THP migration. A new ioctl to emulate failure of large page allocations has been added to test the folio split code path. hmm-tests.c has new test cases for huge page migration and to test the folio split path. A new throughput test has been added as well. The nouveau dmem code has been enhanced to use the new THP migration capability. mTHP support: The patches hard code, HPAGE_PMD_NR in a few places, but the code has been kept generic to support various order sizes. With additional refactoring of the code support of different order sizes should be possible. The future plan is to post enhancements to support mTHP with a rough design as follows: 1. Add the notion of allowable thp orders to the HMM based test driver 2. For non PMD based THP paths in migrate_device.c, check to see if a suitable order is found and supported by the driver 3. Iterate across orders to check the highest supported order for migration 4. Migrate and finalize The mTHP patches can be built on top of this series, the key design elements that need to be worked out are infrastructure and driver support for multiple ordered pages and their migration. HMM support for large folios was added in 10b9feee2d0d ("mm/hmm: populate PFNs from PMD swap entry"). This patch (of 16) Add routines to support allocation of large order zone device folios and helper functions for zone device folios, to check if a folio is device private and helpers for setting zone device data. When large folios are used, the existing page_free() callback in pgmap is called when the folio is freed, this is true for both PAGE_SIZE and higher order pages. Zone device private large folios do not support deferred split and scan like normal THP folios. Link: https://lkml.kernel.org/r/20251001065707.920170-1-balbirs@nvidia.com Link: https://lkml.kernel.org/r/20251001065707.920170-2-balbirs@nvidia.com Link: https://lore.kernel.org/linux-mm/20201106005147.20113-1-rcampbell@nvidia.com/ [1] Signed-off-by: Balbir Singh <balbirs@nvidia.com> Cc: David Hildenbrand <david@redhat.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Byungchul Park <byungchul@sk.com> Cc: Gregory Price <gourry@gourry.net> Cc: Ying Huang <ying.huang@linux.alibaba.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Nico Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Barry Song <baohua@kernel.org> Cc: Lyude Paul <lyude@redhat.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: David Airlie <airlied@gmail.com> Cc: Simona Vetter <simona@ffwll.ch> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Mika Penttilä <mpenttil@redhat.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Francois Dugast <francois.dugast@intel.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-25drm/nouveau: verify that hardware supports the flush page addressTimur Tabi5-0/+15
Ensure that the DMA address of the framebuffer flush page is not larger than its hardware register. On GPUs older than Hopper, the register for the address can hold up to a 40-bit address (right-shifted by 8 so that it fits in the 32-bit register), and on Hopper and later it can be 52 bits (64-bit register where bits 52-63 must be zero). Recently it was discovered that under certain conditions, the flush page could be allocated outside this range. Although this bug was fixed, we can ensure that any future changes to this code don't accidentally generate an invalid page address. Signed-off-by: Timur Tabi <ttabi@nvidia.com> Reviewed-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Lyude Paul <lyude@redhat.com> Link: https://patch.msgid.link/20251113230323.1271726-2-ttabi@nvidia.com
2025-11-25drm/nouveau: restrict the flush page to a 32-bit addressTimur Tabi1-1/+1
The flush page DMA address is stored in a special register that is not associated with the GPU's standard DMA range. For example, on Turing, the GPU's MMU can handle 47-bit addresses, but the flush page address register is limited to 40 bits. At the point during device initialization when the flush page is allocated, the DMA mask is still at its default of 32 bits. So even though it's unlikely that the flush page could exist above a 40-bit address, the dma_map_page() call could fail, e.g. if IOMMU is disabled and the address is above 32 bits. The simplest way to achieve all constraints is to allocate the page in the DMA32 zone. Since the flush page is literally just a page, this is an acceptable limitation. The alternative is to temporarily set the DMA mask to 40 (or 52 for Hopper and later) bits, but that could have unforseen side effects. In situations where the flush page is allocated above 32 bits and IOMMU is disabled, you will get an error like this: nouveau 0000:65:00.0: DMA addr 0x0000000107c56000+4096 overflow (mask ffffffff, bus limit 0). Fixes: 5728d064190e ("drm/nouveau/fb: handle sysmem flush page from common code") Signed-off-by: Timur Tabi <ttabi@nvidia.com> Reviewed-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Lyude Paul <lyude@redhat.com> Link: https://patch.msgid.link/20251113230323.1271726-1-ttabi@nvidia.com
2025-11-24drm/plane: Remove const qualifier from plane->modifiers allocation typeKees Cook1-1/+1
In preparation for making the kmalloc family of allocators type aware, we need to make sure that the returned type from the allocation matches the type of the variable being assigned. (Before, the allocator would always return "void *", which can be implicitly cast to any pointer type.) The assigned type is "uint64_t *", but the returned type, while matching, will be const qualified. As there is no general way to remove const qualifiers, adjust the allocation type to match the assignment. Link: https://patch.msgid.link/20250426061325.work.665-kees@kernel.org Signed-off-by: Kees Cook <kees@kernel.org>
2025-11-24drm/xe/uc: Change assertion to error on huc authentication failureZhanjun Dong1-2/+5
The fault injection test can cause the xe_huc_auth function to fail. This is an intentional failure, so in this scenario we don't want to throw an assert and taint the kernel, because that will impact CI execution. Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patch.msgid.link/20251027214212.2856903-1-zhanjun.dong@intel.com
2025-11-24drm/xe/guc: Cleanup GuC log buffer macros and helpersZhanjun Dong6-131/+85
Cleanup GuC log buffer macros and helpers, add Xe style macro prefix. Update buffer type values to align with the GuC specification Update buffer offset calculation. Remove helper functions, replaced with macros. Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patch.msgid.link/20251105233143.1168759-1-zhanjun.dong@intel.com
2025-11-24drm/amd/amdgpu: reserve vm invalidation engine for uni_mesMichael Chen1-0/+3
Reserve vm invalidation engine 6 when uni_mes enabled. It is used in processing tlb flush request from host. Signed-off-by: Michael Chen <michael.chen@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Shaoyun liu <Shaoyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 873373739b9b150720ea2c5390b4e904a4d21505) Cc: stable@vger.kernel.org
2025-11-24drm/xe/pf: Use div_u64 when calculating GGTT profileMichal Wajdeczko1-1/+1
This will fix the following error seen on some 32-bit config: "ERROR: modpost: "__udivdi3" [drivers/gpu/drm/xe/xe.ko] undefined!" Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202511150929.3vUi6PEJ-lkp@intel.com/ Fixes: e448372e8a8e ("drm/xe/pf: Use migration-friendly GGTT auto-provisioning") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://patch.msgid.link/20251115151323.10828-1-michal.wajdeczko@intel.com
2025-11-24drm/amd/pm: adjust the visibility of pp_table sysfs nodeYang Wang4-6/+24
v1: - make pp_table invisible on VF mode (only valid on BM) - make pp_table invisible on Mi* chips (Not supported) - make pp_table invisible if scpm feature is enabled. v2: move pp_table invisible code logic into amdgpu_dpm_get_pp_table() function. v3: add table buffer pointer check both on powerplay & swsmu. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-24Revert "drm/amd: fix gfx hang on renoir in IGT reload test"Rodrigo Siqueira1-4/+0
The original patch introduced additional latency during boot time because it triggers a driver reload to avoid a CP hang when the driver is reloaded multiple times. This has been addressed with a more generic solution that triggers the GPU reset only during the unload phase, avoiding extra latency during boot time. For this reason, this commit reverts the original change. This reverts commit 72a98763b473890e6605604bfcaf71fc212b4720. This patch should only be applied if commit: 4355e61835e7 ("drm/amdgpu: Fix GFX hang on SteamDeck when amdgpu is reloaded") is present. Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Rodrigo Siqueira <siqueira@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-24drm/amdgpu: Fix GFX hang on SteamDeck when amdgpu is reloadedRodrigo Siqueira1-0/+14
When trying to unload amdgpu in the SteamDeck (TTY mode), the following set of errors happens and the system gets unstable: [..] [drm] Initialized amdgpu 3.64.0 for 0000:04:00.0 on minor 0 amdgpu 0000:04:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on gfx_0.0.0 (-110). amdgpu 0000:04:00.0: amdgpu: ib ring test failed (-110). [..] amdgpu 0000:04:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000001E SMN_C2PMSG_82:0x00000000 amdgpu 0000:04:00.0: amdgpu: Failed to disable gfxoff! amdgpu 0000:04:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000001E SMN_C2PMSG_82:0x00000000 amdgpu 0000:04:00.0: amdgpu: Failed to disable gfxoff! [..] When the driver initializes the GPU, the PSP validates all the firmware loaded, and after that, it is not possible to load any other firmware unless the device is reset. What is happening in the load/unload situation is that PSP halts the GC engine because it suspects that something is amiss. To address this issue, this commit ensures that the GPU is reset (mode 2 reset) in the unload sequence. Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Rodrigo Siqueira <siqueira@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-24drm/amd/pm: fix amdgpu_irq enabled counter unbalanced on smu v11.0Yang Wang2-3/+11
v1: - fix amdgpu_irq enabled counter unbalanced issue on smu_v11_0_disable_thermal_alert. v2: - re-enable smu thermal alert to make amdgpu irq counter balance for smu v11.0 if in runpm state [75582.361561] ------------[ cut here ]------------ [75582.361565] WARNING: CPU: 42 PID: 533 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:639 amdgpu_irq_put+0xd8/0xf0 [amdgpu] ... [75582.362211] Tainted: [E]=UNSIGNED_MODULE [75582.362214] Hardware name: GIGABYTE MZ01-CE0-00/MZ01-CE0-00, BIOS F14a 08/14/2020 [75582.362218] Workqueue: pm pm_runtime_work [75582.362225] RIP: 0010:amdgpu_irq_put+0xd8/0xf0 [amdgpu] [75582.362556] Code: 31 f6 31 ff e9 c9 bf cf c2 44 89 f2 4c 89 e6 4c 89 ef e8 db fc ff ff 5b 41 5c 41 5d 41 5e 5d 31 d2 31 f6 31 ff e9 a8 bf cf c2 <0f> 0b eb c3 b8 fe ff ff ff eb 97 e9 84 e8 8b 00 0f 1f 84 00 00 00 [75582.362560] RSP: 0018:ffffd50d51297b80 EFLAGS: 00010246 [75582.362564] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000 [75582.362568] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [75582.362570] RBP: ffffd50d51297ba0 R08: 0000000000000000 R09: 0000000000000000 [75582.362573] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8e72091d2008 [75582.362576] R13: ffff8e720af80000 R14: 0000000000000000 R15: ffff8e720af80000 [75582.362579] FS: 0000000000000000(0000) GS:ffff8e9158262000(0000) knlGS:0000000000000000 [75582.362582] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [75582.362585] CR2: 000074869d040c14 CR3: 0000001e37a3e000 CR4: 00000000003506f0 [75582.362588] Call Trace: [75582.362591] <TASK> [75582.362597] smu_v11_0_disable_thermal_alert+0x17/0x30 [amdgpu] [75582.362983] smu_smc_hw_cleanup+0x79/0x4f0 [amdgpu] [75582.363375] smu_suspend+0x92/0x110 [amdgpu] [75582.363762] ? gfx_v10_0_hw_fini+0xd5/0x150 [amdgpu] [75582.364098] amdgpu_ip_block_suspend+0x27/0x80 [amdgpu] [75582.364377] ? timer_delete_sync+0x10/0x20 [75582.364384] amdgpu_device_ip_suspend_phase2+0x190/0x450 [amdgpu] [75582.364665] amdgpu_device_suspend+0x1ae/0x2f0 [amdgpu] [75582.364948] amdgpu_pmops_runtime_suspend+0xf3/0x1f0 [amdgpu] [75582.365230] pci_pm_runtime_suspend+0x6d/0x1f0 [75582.365237] ? __pfx_pci_pm_runtime_suspend+0x10/0x10 [75582.365242] __rpm_callback+0x4c/0x190 [75582.365246] ? srso_return_thunk+0x5/0x5f [75582.365252] ? srso_return_thunk+0x5/0x5f [75582.365256] ? ktime_get_mono_fast_ns+0x43/0xe0 [75582.365263] rpm_callback+0x6e/0x80 [75582.365267] rpm_suspend+0x124/0x5f0 [75582.365271] ? srso_return_thunk+0x5/0x5f [75582.365275] ? __schedule+0x439/0x15e0 [75582.365281] ? srso_return_thunk+0x5/0x5f [75582.365285] ? __queue_delayed_work+0xb8/0x180 [75582.365293] pm_runtime_work+0xc6/0xe0 [75582.365297] process_one_work+0x1a1/0x3f0 [75582.365303] worker_thread+0x2ba/0x3d0 [75582.365309] kthread+0x107/0x220 [75582.365313] ? __pfx_worker_thread+0x10/0x10 [75582.365318] ? __pfx_kthread+0x10/0x10 [75582.365323] ret_from_fork+0xa2/0x120 [75582.365328] ? __pfx_kthread+0x10/0x10 [75582.365332] ret_from_fork_asm+0x1a/0x30 [75582.365343] </TASK> [75582.365345] ---[ end trace 0000000000000000 ]--- [75582.365350] amdgpu 0000:05:00.0: amdgpu: Fail to disable thermal alert! [75582.365379] amdgpu 0000:05:00.0: amdgpu: suspend of IP block <smu> failed -22 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>