summaryrefslogtreecommitdiff
path: root/drivers/gpu
AgeCommit message (Collapse)AuthorFilesLines
2025-03-10drm/xe/guc_pc: Retry and wait longer for GuC PC startRodrigo Vivi1-13/+40
In a rare situation of thermal limit during resume, GuC can be slow and run into delays like this: xe 0000:00:02.0: [drm] GT1: excessive init time: 667ms! \ [status = 0x8002F034, timeouts = 0] xe 0000:00:02.0: [drm] GT1: excessive init time: \ [freq = 100MHz (req = 800MHz), before = 100MHz, \ perf_limit_reasons = 0x1C001000] xe 0000:00:02.0: [drm] *ERROR* GT1: GuC PC Start failed ------------[ cut here ]------------ xe 0000:00:02.0: [drm] GT1: Failed to start GuC PC: -EIO When this happens, it will block entirely the GPU to be used. So, let's try and with a huge timeout in the hope it comes back. Also, let's collect some information on how long it is usually taking on situations like this, so perhaps the time can be tuned later. Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Cc: Jonathan Cavitt <jonathan.cavitt@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250307160307.1093391-1-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-03-10drm/xe: Use correct type width for alignment in fb pinning codeTvrtko Ursulin1-10/+10
Plane->min_alignment returns an unsigned int so lets use that in the whole relevant call chain. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250307111402.26577-5-tvrtko.ursulin@igalia.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-03-10drm/xe: Pass flags directly to emit_flush_imm_ggttTvrtko Ursulin1-7/+6
This is more readable than the nameless booleans and will also come handy later. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250307111402.26577-4-tvrtko.ursulin@igalia.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-03-10drm/xe: Fix ring flush invalidationTvrtko Ursulin1-9/+6
Emit_flush_invalidate() is incorrectly marking the write to LRC_PPHWSP as a GGTT write and also writing an atypical ~0 dword as the payload. Fix it. While at it drop the unused flags argument. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250307111402.26577-3-tvrtko.ursulin@igalia.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-03-10drm/xe: Fix MOCS debugfs LNCF readoutTvrtko Ursulin1-1/+3
With only XE_FW_GT taken LNCF registers read back as all zeroes, leading to a wild goose chase trying to figure out why is register programming incorrect. Fix it by grabbing XE_FORCEWAKE_ALL for affected platforms. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250307111402.26577-2-tvrtko.ursulin@igalia.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-03-10drm/xe/pm: Temporarily disable D3Cold on BMGRodrigo Vivi1-1/+12
Currently, many instability cases related to D3Cold -> D0 transition on BMG are under investigation. Among them some bad cases where the device is lost after 1 to 3 transitions from D3Cold to D0 on the runtime pm, with pcieport upstream bridge port link retrain failure. In other cases, it works fine, but with some sudden random memory corruptions after D3cold, that could be 0xffff missed ack on GT forcewake or GuC reload related failures. In some other cases though, D3Cold -> D0 works pretty reliably. It looks like it is a combination of GPU cards and Host boards at this point. So, there is no possible/available quirk at this time. This patch disables the D3Cold by default on BMG by reducing the vram_d3cold_threshold to 0. Users and developers who wants to enable it are still able to via $ echo 300 > /sys/bus/pci/devices/<addr>/vram_d3cold_threshold Fixes: 3adcf970dc7e ("drm/xe/bmg: Drop force_probe requirement") Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4037 Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4395 Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4396 Cc: Karthik Poosa <karthik.poosa@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250308005636.1475420-1-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit d945cc876277851053c0cf37927c8d7bd9d0e880) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-03-10drm/i915/cdclk: Do cdclk post plane programming laterVille Syrjälä1-3/+2
We currently call intel_set_cdclk_post_plane_update() far too early. When pipes are active during the reprogramming the current spot only works for the cd2x divider update case, as that is synchronize to the pipe's vblank. Squashing and crawling are not synchronized in any way, so doing the programming while the pipes/planes are potentially still using the old hardware state could lead to underruns. Move the post plane reprgramming to a spot where we know that the pipes/planes have switched over the new hardware state. Cc: stable@vger.kernel.org Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250218211913.27867-2-ville.syrjala@linux.intel.com Reviewed-by: Vinod Govindapillai <vinod.govindapillai@intel.com> (cherry picked from commit fb64f5568c0e0b5730733d70a012ae26b1a55815) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-03-10drm/xe/userptr: Fix an incorrect assertThomas Hellström1-1/+5
The assert incorrectly checks the total length processed which can in fact be greater than the number of pages. Fix. Fixes: 0a98219bcc96 ("drm/xe/hmm: Don't dereference struct page pointers without notifier lock") Cc: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250307100109.21397-1-thomas.hellstrom@linux.intel.com (cherry picked from commit 70e5043ba85eae199b232e39921abd706b5c1fa4) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-03-10drm/xe: Release guc ids before cancelling workTejas Upadhyay1-1/+1
A GT resets can be occurring in parallel while cancelling work in async call which can requeue these workers. to avoid that, lets first release guc ids and then cancel work so they don't requeued. Fixes: 8ae8a2e8dd21 ("drm/xe: Long running job update") Fixes: 12c2f962fe71 ("drm/xe: cancel pending job timer before freeing scheduler") Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Suggested-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250306131211.975503-1-tejas.upadhyay@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit 8e8d76f62329127b31c64a034b052fb9e30e92af) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-03-10drm/xe/pm: Temporarily disable D3Cold on BMGRodrigo Vivi1-1/+12
Currently, many instability cases related to D3Cold -> D0 transition on BMG are under investigation. Among them some bad cases where the device is lost after 1 to 3 transitions from D3Cold to D0 on the runtime pm, with pcieport upstream bridge port link retrain failure. In other cases, it works fine, but with some sudden random memory corruptions after D3cold, that could be 0xffff missed ack on GT forcewake or GuC reload related failures. In some other cases though, D3Cold -> D0 works pretty reliably. It looks like it is a combination of GPU cards and Host boards at this point. So, there is no possible/available quirk at this time. This patch disables the D3Cold by default on BMG by reducing the vram_d3cold_threshold to 0. Users and developers who wants to enable it are still able to via $ echo 300 > /sys/bus/pci/devices/<addr>/vram_d3cold_threshold Fixes: 3adcf970dc7e ("drm/xe/bmg: Drop force_probe requirement") Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4037 Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4395 Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4396 Cc: Karthik Poosa <karthik.poosa@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250308005636.1475420-1-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-03-10drm/xe/rtp: Drop sentinels from arg to xe_rtp_process_to_sr()Lucas De Marchi7-26/+18
There's a mismatch on API: while xe_rtp_process_to_sr() processes entries until an entry without name, the active tracking with xe_rtp_process_ctx_enable_active_tracking() needs to use the number of elements. The number of elements is taken everywhere using ARRAY_SIZE(), but that will have one entry too many. This leads to the following warning, as reported by lkp: drivers/gpu/drm/xe/xe_tuning.c: In function 'xe_tuning_dump': >> include/drm/drm_print.h:228:31: warning: '%s' directive argument is null [-Wformat-overflow=] 228 | drm_printf((printer), "%.*s" fmt, (indent), "\t\t\t\t\tX", ##__VA_ARGS__) | ^~~~~~ drivers/gpu/drm/xe/xe_tuning.c:226:17: note: in expansion of macro 'drm_printf_indent' 226 | drm_printf_indent(p, 1, "%s\n", engine_tunings[idx].name); | ^~~~~~~~~~~~~~~~~ That's because it will still process the last entry when tracking the active tunings. The same issue exists in the WAs. Change xe_rtp_process_to_sr() to also take the number of elements so the empty entry can be removed and the warning should go away. Fixing on the active-tracking side would more fragile as the it would need a `- 1` everywhere and continue to use a different approach for number of elements. Aside from the warning, it's a non-issue as there would always be enough bits allocated and the last entry would never be active since xe_rtp_process_to_sr() stops on the sentinel. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202503021906.P2MwAvyK-lkp@intel.com/ Cc: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250306-fix-print-warning-v1-1-979c3dc03c0d@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-03-10drm/i915/xe3lpd: Map POWER_DOMAIN_AUDIO_PLAYBACK to DC_offGustavo Sousa1-0/+1
In Xe3_LPD, display audio has the core audio logic located in PG0 and per-transcoder logic in the same power well that provides power for the transcoder [1]. For stuff like audio device enumeration, we need to ensure that PG0 is turned on. For playback, we additionally need the transcoder's power well to be enabled. That essentially means that, for audio playback, there isn't a special power well that needs to be enabled, because modeset sequences will ensure that the required power wells are enabled. That said, there might be cases where PG0 could be disabled due to display entering DC6 while the audio driver tries to interact with the graphics driver for stuff like audio device enumeration. We recently hit that kind of scenario, where "aplay -l" was being used to enumerate audio devices on a PTL machine with PSR enabled and no external displays attached. Since intel_audio_component_get_power() uses POWER_DOMAIN_AUDIO_PLAYBACK, make sure to map that power domain to DC_off power well, so that we disable dynamic DC states (which includes DC6) while the audio driver needs display audio power. [1] The core-audio vs per-transcoder logic split is not really new in Xe3_LPD. This is also true for previous display generations. We need to figure out the correct version where this split happened so that we can apply fixes in the current power domain mapping. Bspec: 72519 Reviewed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250227-xe3lpd-power-domain-audio-playback-v1-1-5765f21da977@intel.com Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
2025-03-10drm/xe: Remove GEN11 prefixes from documentationLucas De Marchi1-1/+1
The registers are already named without the GEN11 prefix. Do the same in the memirq documentation. Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250306-drop-gen-v1-2-03683e56006a@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-03-10drm/xe: Remove pointless gen11 assertionsLucas De Marchi1-6/+0
xe driver doesn't really work in gen11. Stop asserting for >= 11, as it would likely explode anyway if tried on such platforms. Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250306-drop-gen-v1-1-03683e56006a@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-03-10drm/appletbdrm: Fix ref-counting on dmadevThomas Zimmermann1-1/+0
Remove the put_device() call on dmadev. The driver sets the field without getting a reference, so it shouldn't put a reference either. The dmadev field points to the regular USB device for which DRM maintains a reference internally. Hence dmadev will not become dangling during the DRM device's lifetime. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Fixes: 0670c2f56e45 ("drm/tiny: add driver for Apple Touch Bars in x86 Macs") Cc: Aditya Garg <gargaditya08@live.com> Cc: Aun-Ali Zaidi <admin@kodeit.net> Cc: dri-devel@lists.freedesktop.org Acked-by: Aditya Garg <gargaditya08@live.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250307083702.142675-1-tzimmermann@suse.de
2025-03-10drm/panic: clean Clippy warningMiguel Ojeda1-1/+1
Clippy warns: error: manual implementation of an assign operation --> drivers/gpu/drm/drm_panic_qr.rs:418:25 | 418 | self.carry = self.carry % pow; | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: replace it with: `self.carry %= pow` | = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#assign_op_pattern Thus clean it up. Fixes: dbed4a797e00 ("drm/panic: Better binary encoding in QR code") Signed-off-by: Miguel Ojeda <ojeda@kernel.org> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com> Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250303093242.1011790-1-ojeda@kernel.org
2025-03-10drm/gma500: Remove unused psb_mmu_virtual_to_pfnDr. David Alan Gilbert2-43/+0
psb_mmu_virtual_to_pfn() was added in 2011 by commit 8c8f1c958ab5 ("gma500: introduce the GTT and MMU handling logic") but hasn't been used. Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250308234428.255164-1-linux@treblig.org
2025-03-10drm/gma500/psb_intel_modes: Remove unused psb_intel_ddc_probeDr. David Alan Gilbert2-32/+0
psb_intel_ddc_probe() was added in 2011 by commit 89c78134cc54 ("gma500: Add Poulsbo support") but has remained unused (probably because drm_get_edid is used instead). Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250308234356.255114-1-linux@treblig.org
2025-03-10drm/gpusvm: Fix kernel-docLucas De Marchi1-55/+69
Due to wrong `.. kernel-doc` directive in Documentation/gpu/rfc/gpusvm.rst the documentation was actually not parsing anything from drivers/gpu/drm/drm_gpusvm.c. This fixes the kernel-doc include and all warnings/errors created when doing so. Cc: Simona Vetter <simona.vetter@ffwll.ch> Cc: Dave Airlie <airlied@redhat.com> Cc: Christian König <christian.koenig@amd.com> Cc: dri-devel@lists.freedesktop.org Cc: Matthew Brost <matthew.brost@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/intel-xe/20250307195239.57abcd2d@canb.auug.org.au/ Fixes: 99624bdff867 ("drm/gpusvm: Add support for GPU Shared Virtual Memory") Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250307-fix-svm-kerneldoc-v2-1-03c74b199620@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-03-10drm/hyperv: Fix address space leak when Hyper-V DRM device is removedMichael Kelley1-0/+2
When a Hyper-V DRM device is probed, the driver allocates MMIO space for the vram, and maps it cacheable. If the device removed, or in the error path for device probing, the MMIO space is released but no unmap is done. Consequently the kernel address space for the mapping is leaked. Fix this by adding iounmap() calls in the device removal path, and in the error path during device probing. Fixes: f1f63cbb705d ("drm/hyperv: Fix an error handling path in hyperv_vmbus_probe()") Fixes: a0ab5abced55 ("drm/hyperv : Removing the restruction of VRAM allocation with PCI bar size") Signed-off-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com> Tested-by: Saurabh Sengar <ssengar@linux.microsoft.com> Link: https://lore.kernel.org/r/20250210193441.2414-1-mhklinux@outlook.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20250210193441.2414-1-mhklinux@outlook.com>
2025-03-10Merge tag 'amd-drm-next-6.15-2025-03-07' of ↵Dave Airlie299-5027/+7232
https://gitlab.freedesktop.org/agd5f/linux into drm-next amdgpu: - Fix spelling typos - RAS updates - VCN 5.0.1 updates - SubVP fixes - DCN 4.0.1 fixes - MSO DPCD fixes - DIO encoder refactor - PCON fixes - Misc cleanups - DMCUB fixes - USB4 DP fixes - DM cleanups - Backlight cleanups and fixes - Support platform backlight curves - Misc code cleanups - SMU 14 fixes - JPEG 4.0.3 reset updates - SR-IOV fixes - SVM fixes - GC 12 DCC fixes - DC DCE 6.x fix - Hiberation fix amdkfd: - Fix possible NULL pointer in queue validation - Remove unnecessary CP domain validation - SDMA queue reset support - Add per process flags radeon: - Fix spelling typos - RS400 hyperZ fix UAPI: - Add KFD per process flags for setting precision Proposed user space: https://github.com/ROCm/ROCR-Runtime/commit/2a64fa5e06e80e0af36df4ce0c76ae52eeec0a9d Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250307211051.1880472-1-alexander.deucher@amd.com
2025-03-09panic_qr: use new #[export] macroAlice Ryhl2-9/+9
This validates at compile time that the signatures match what is in the header file. It highlights one annoyance with the compile-time check, which is that it can only be used with functions marked unsafe. If the function is not unsafe, then this error is emitted: error[E0308]: `if` and `else` have incompatible types --> <linux>/drivers/gpu/drm/drm_panic_qr.rs:987:19 | 986 | #[export] | --------- expected because of this 987 | pub extern "C" fn drm_panic_qr_max_data_size(version: u8, url_len: usize) -> usize { | ^^^^^^^^^^^^^^^^^^^^^^^^^^ expected unsafe fn, found safe fn | = note: expected fn item `unsafe extern "C" fn(_, _) -> _ {kernel::bindings::drm_panic_qr_max_data_size}` found fn item `extern "C" fn(_, _) -> _ {drm_panic_qr_max_data_size}` The signature declarations are moved to a header file so it can be included in the Rust bindings helper, and the extern keyword is removed as it is unnecessary. Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org> Reviewed-by: Tamir Duberstein <tamird@gmail.com> Acked-by: Simona Vetter <simona.vetter@ffwll.ch> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Alice Ryhl <aliceryhl@google.com> Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com> Link: https://lore.kernel.org/r/20250303-export-macro-v3-5-41fbad85a27f@google.com [ Fixed `rustfmt`. Moved on top the unsafe requirement comment to follow the usual style, and slightly reworded it for clarity. Formatted bindings helper comment. - Miguel ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-03-09gpu: nova-core: add initial driver stubDanilo Krummrich9-0/+405
Add the initial nova-core driver stub. nova-core is intended to serve as a common base for nova-drm (the corresponding DRM driver) and the vGPU manager VFIO driver, serving as a hard- and firmware abstraction layer for GSP-based NVIDIA GPUs. The Nova project, including nova-core and nova-drm, in the long term, is intended to serve as the successor of Nouveau for all GSP-based GPUs. The motivation for both, starting a successor project for Nouveau and doing so using the Rust programming language, is documented in detail through a previous post on the mailing list [1], an LWN article [2] and a talk from LPC '24. In order to avoid the chicken and egg problem to require a user to upstream Rust abstractions, but at the same time require the Rust abstractions to implement the driver, nova-core kicks off as a driver stub and is subsequently developed upstream. Link: https://lore.kernel.org/dri-devel/Zfsj0_tb-0-tNrJy@cassiopeiae/T/#u [1] Link: https://lwn.net/Articles/990736/ [2] Link: https://youtu.be/3Igmx28B3BQ?si=sBdSEer4tAPKGpOs [3] Reviewed-by: Alexandre Courbot <acourbot@nvidia.com> Link: https://lore.kernel.org/r/20250306222336.23482-5-dakr@kernel.org Signed-off-by: Danilo Krummrich <dakr@kernel.org>
2025-03-09drm/vc4: plane: fix inconsistent indenting warningCharles Han1-1/+1
Fix below inconsistent indenting smatch warning. smatch warnings: drivers/gpu/drm/vc4/vc4_plane.c:2083 vc6_plane_mode_set() warn: inconsistent indenting Signed-off-by: Charles Han <hanchunchao@inspur.com> Signed-off-by: Maíra Canal <mcanal@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250305102107.2595-1-hanchunchao@inspur.com
2025-03-09drm/nouveau/nvkm: introduce new GSP reply policy NVKM_GSP_RPC_REPLY_POLLZhi Wang3-1/+8
Some GSP RPC commands need a new reply policy: "caller don't care about the message content but want to make sure a reply is received". To support this case, a new reply policy is introduced. NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY is a large GSP RPC command. The actual required policy is NVKM_GSP_RPC_REPLY_POLL. This can be observed from the dump of the GSP message queue. After the large GSP RPC command is issued, GSP will write only an empty RPC header in the queue as the reply. Without this change, the policy "receiving the entire message" is used for NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY. This causes the timeout of receiving the returned GSP message in the suspend/resume path. Introduce the new reply policy NVKM_GSP_RPC_REPLY_POLL, which waits for the returned GSP message but discards it for the caller. Use the new policy NVKM_GSP_RPC_REPLY_POLL on the GSP RPC command NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY. Fixes: 50f290053d79 ("drm/nouveau: support handling the return of large GSP message") Cc: Danilo Krummrich <dakr@kernel.org> Cc: Alexandre Courbot <acourbot@nvidia.com> Tested-by: Ben Skeggs <bskeggs@nvidia.com> Signed-off-by: Zhi Wang <zhiw@nvidia.com> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250227013554.8269-3-zhiw@nvidia.com
2025-03-09drm/nouveau/nvkm: factor out current GSP RPC command policiesZhi Wang4-44/+69
There can be multiple cases of handling the GSP RPC messages, which are the reply of GSP RPC commands according to the requirement of the callers and the nature of the GSP RPC commands. The current supported reply policies are "callers don't care" and "receive the entire message" according to the requirement of the callers. To introduce a new policy, factor out the current RPC command reply polices. Also, centralize the handling of the reply in a single function. Factor out NVKM_GSP_RPC_REPLY_NOWAIT as "callers don't care" and NVKM_GSP_RPC_REPLY_RECV as "receive the entire message". Introduce a kernel doc to document the policies. Factor out r535_gsp_rpc_handle_reply(). No functional change is intended for small GSP RPC commands. For large GSP commands, the caller decides the policy of how to handle the returned GSP RPC message. Cc: Ben Skeggs <bskeggs@nvidia.com> Cc: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Zhi Wang <zhiw@nvidia.com> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250227013554.8269-2-zhiw@nvidia.com
2025-03-08drm/xe/userptr: Fix an incorrect assertThomas Hellström1-1/+5
The assert incorrectly checks the total length processed which can in fact be greater than the number of pages. Fix. Fixes: ea3e66d280ce ("drm/xe/hmm: Don't dereference struct page pointers without notifier lock") Cc: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250307100109.21397-1-thomas.hellstrom@linux.intel.com
2025-03-08drm/msm/dpu: drop wb2_formats_rgbDmitry Baryshkov1-31/+0
After enabling YUV support for writeback on a variety of DPU hardware, the wb2_formats_rgb is now unused. Drop it following the report of LKP. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202503071857.oZbQsPaE-lkp@intel.com/ Reviewed-by: Rob Clark <robdclark@gmail.com> # on IRC Patchwork: https://patchwork.freedesktop.org/patch/641848/ Link: https://lore.kernel.org/r/20250308-dpu-drop-wb2-rgb-v1-1-f5503fcd1bc2@linaro.org Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
2025-03-08drm/msm/dpu: Fix uninitialized variable in dpu_crtc_kickoff_clone_mode()Dan Carpenter1-1/+1
After the loop there is a check for whether "wb_encoder" has been set to non-NULL, however it was never set to NULL. Initialize it to NULL. Fixes: ad06972d5365 ("drm/msm/dpu: Reorder encoder kickoff for CWB") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Patchwork: https://patchwork.freedesktop.org/patch/641631/ Link: https://lore.kernel.org/r/f8ba03dc-0f90-4781-8d54-c16b3251ecb1@stanley.mountain Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
2025-03-08drm/msm/dpu: correct struct dpu_encoder_virt docsDmitry Baryshkov1-1/+1
Fix a typo in struct dpu_encoder_virt kerneldoc, which made it ignore description of the cwb_mask field. Fixes: dd331404ac7c ("drm/msm/dpu: Configure CWB in writeback encoder") Signed-off-by: Dmitry Baryshkov <lumag@kernel.org> Reviewed-by: Rob Clark <robdclark@gmail.com> Patchwork: https://patchwork.freedesktop.org/patch/641315/ Link: https://lore.kernel.org/r/20250306-dpu-fix-docs-v1-2-e51b71e8ad84@kernel.org Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
2025-03-08drm/msm/dpu: correct dpu_crtc_check_mode_changed docsDmitry Baryshkov1-1/+2
Correct commit 20972609d12c ("drm/msm/dpu: Require modeset if clone mode status changes") and describe old_crtc_state and new_crtc_state params instead of the single previously used parameter crtc_state. Fixes: 20972609d12c ("drm/msm/dpu: Require modeset if clone mode status changes") Signed-off-by: Dmitry Baryshkov <lumag@kernel.org> Reviewed-by: Rob Clark <robdclark@gmail.com> Patchwork: https://patchwork.freedesktop.org/patch/641313/ Link: https://lore.kernel.org/r/20250306-dpu-fix-docs-v1-1-e51b71e8ad84@kernel.org Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
2025-03-07drm/amdkfd: Add support for more per-process flagHarish Kasiviswanathan10-17/+41
Add support for more per-process flags starting with option to configure MFMA precision for gfx 9.5 v2: Change flag name to KFD_PROC_FLAG_MFMA_HIGH_PRECISION Remove unused else condition v3: Bump the KFD API version v4: Missed SH_MEM_CONFIG__PRECISION_MODE__SHIFT define. Added it. Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Amber Lin <Amber.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-07drm/amdkfd: Set per-process flags only once for gfx9/10/11/12Harish Kasiviswanathan4-52/+107
Define set_cache_memory_policy() for these asics and move all static changes from update_qpd() which is called each time a queue is created to set_cache_memory_policy() which is called once during process initialization Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Amber Lin <Amber.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-07drm/amdkfd: Set per-process flags only once cik/viHarish Kasiviswanathan3-85/+94
Set per-process static sh_mem config only once during process initialization. Move all static changes from update_qpd() which is called each time a queue is created to set_cache_memory_policy() which is called once during process initialization. set_cache_memory_policy() is currently defined only for cik and vi family. So this commit only focuses on these two. A separate commit will address other asics. Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Amber Lin <Amber.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-07drm/amd: Keep display off while going into S4Mario Limonciello2-2/+14
When userspace invokes S4 the flow is: 1) amdgpu_pmops_prepare() 2) amdgpu_pmops_freeze() 3) Create hibernation image 4) amdgpu_pmops_thaw() 5) Write out image to disk 6) Turn off system Then on resume amdgpu_pmops_restore() is called. This flow has a problem that because amdgpu_pmops_thaw() is called it will call amdgpu_device_resume() which will resume all of the GPU. This includes turning the display hardware back on and discovering connectors again. This is an unexpected experience for the display to turn back on. Adjust the flow so that during the S4 sequence display hardware is not turned back on. Reported-by: Xaver Hugl <xaver.hugl@gmail.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2038 Cc: Muhammad Usama Anjum <usama.anjum@collabora.com> Tested-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Harry Wentland <harry.wentland@amd.com> Link: https://lore.kernel.org/r/20250306185124.44780-1-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-07drm/amd/amdgpu: Add missing GC 11.5.0 registerTom St Denis1-0/+2
Adds register needed for debugging purposes. Signed-off-by: Tom St Denis <tom.stdenis@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-07drm/amdkfd: clear F8_MODE for gfx950Alex Sierra1-2/+1
Default F8_MODE should be OCP format on gfx950. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Amber Lin <Amber.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-07drm/amdgpu: add defines for pin_offsets in DCE8Alexandre Demers2-7/+16
Define pin_offsets values in the same way it is done in DCE8 Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-07drm/amdgpu: Fix annotation for dce_v6_0_line_buffer_adjust functionSrinivasan Shanmugam1-0/+2
Updated description for the 'other_mode' parameter. This parameter is used to determine the display mode of another display controller that may be sharing the line buffer. Cc: Ken Wang <Qingqing.Wang@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-07drm/amdgpu: handle amdgpu_cgs_create_device() errors in amd_powerplay_create()Wentao Liang1-0/+5
Add error handling to propagate amdgpu_cgs_create_device() failures to the caller. When amdgpu_cgs_create_device() fails, release hwmgr and return -ENOMEM to prevent null pointer dereference. [v1]->[v2]: Change error code from -EINVAL to -ENOMEM. Free hwmgr. Signed-off-by: Wentao Liang <vulab@iscas.ac.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-07drm/amd/display: fix missing .is_two_pixels_per_containerAliaksei Urbanski1-0/+1
Starting from 6.11, AMDGPU driver, while being loaded with amdgpu.dc=1, due to lack of .is_two_pixels_per_container function in dce60_tg_funcs, causes a NULL pointer dereference on PCs with old GPUs, such as R9 280X. So this fix adds missing .is_two_pixels_per_container to dce60_tg_funcs. Reported-by: Rosen Penev <rosenp@gmail.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3942 Fixes: e6a901a00822 ("drm/amd/display: use even ODM slice width for two pixels per container") Signed-off-by: Aliaksei Urbanski <aliaksei.urbanski@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-07drm/amdgpu/display: Allow DCC for video formats on GFX12David Rosca1-2/+5
We advertise DCC as supported for NV12/P010 formats on GFX12, but it would fail on this check on atomic commit. Signed-off-by: David Rosca <david.rosca@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-07drm/amdgpu: Use unique CPER record id across devicesXiang Liu1-5/+13
Encode socket id to CPER record id to be unique across devices. v2: add pointer check for adev->smuio.funcs->get_socket_id v2: set 0 if adev->smuio.funcs->get_socket_id is NULL Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-07drm/amdgpu: fix the gb_addr_config_fields init value mismatchShiwu Zhang1-5/+1
For gfx_v9_4_3 specifically, before regGB_ADDR_CONFIG is overwritten in gfx hw_init it is read out to popluate the gb_addr_config_fields in the sw_init stage, which causes mismatch. Fix it by using the golden value in sw_init as well. v2: This is a driver-set golden reg and keep as it is (Lijo) Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-07drm/amdgpu: retire ip init code specific for A0 revShiwu Zhang1-12/+1
For aqua_vanjaram, A0 HW is retired so remove the code specific for it in gfx ip init. Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-07drm/amdgpu: increase RAS bad page thresholdTao Zhou1-3/+3
For default policy, driver will issue an RMA event when the number of bad pages is greater than 8 physical rows, rather than reaches 8 physical rows, don't rely on threshold configurable parameters in default mode. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-07drm/amdgpu: Fix missing drain retry fault the last entryEmily Deng2-1/+4
While the entry get in svm_range_unmap_from_cpu is the last entry, and the entry is page fault, it also need to be dropped. So for equal case, it also need to be dropped. v2: Only modify the svm_range_restore_pages. Signed-off-by: Emily Deng <Emily.Deng@amd.com> Reviewed-by: Xiaogang Chen<xiaogang.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-07drm/amdgpu: Do not set power brake sequence for Aldebaran SRIOVVictor Lu1-1/+2
Aldebaran SRIOV VF cannot access the power brake feature regs. The accesses can be skipped to avoid a dmesg warning. v2: Remove redundant asic type check Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-07drm/amdkfd: remove unused debug gws support status variableJonathan Kim1-1/+0
Remove unused declaration of gws_debug_workaround. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Amber Lin <amber.lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-07drm/amdgpu: fix inconsistent indenting warningCharles Han1-1/+1
Fix below inconsistent indenting smatch warning. smatch warnings: drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c:582 amdgpu_sdma_reset_engine() warn: inconsistent indenting Signed-off-by: Charles Han <hanchunchao@inspur.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>