summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2026-02-18drm/pagemap: pass pagemap_addr by referenceArnd Bergmann4-7/+7
Passing a structure by value into a function is sometimes problematic, for a number of reasons. Of of these is a warning from the 32-bit arm compiler: drivers/gpu/drm/drm_gpusvm.c: In function '__drm_gpusvm_unmap_pages': drivers/gpu/drm/drm_gpusvm.c:1152:33: note: parameter passing for argument of type 'struct drm_pagemap_addr' changed in GCC 9.1 1152 | dpagemap->ops->device_unmap(dpagemap, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1153 | dev, *addr); | ~~~~~~~~~~~ This particular problem is harmless since we are not mixing compiler versions inside of the compiler. However, passing this by reference avoids the warning along with providing slightly better calling conventions as it avoids an extra copy on the stack. Fixes: 75af93b3f5d0 ("drm/pagemap, drm/xe: Support destination migration over interconnect") Fixes: 2df55d9e66a2 ("drm/xe: Support pcie p2p dma as a fast interconnect") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patch.msgid.link/20260216134644.1025365-1-arnd@kernel.org Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> (cherry picked from commit 95162db0208aee122d10ac1342fe97a1721cd258) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-02-18drm/xe/bo: Redirect faults to dummy page for wedged deviceRaag Jadav1-1/+1
As per uapi documentation[1], the prerequisite for wedged device is to redirected page faults to a dummy page. Follow it. [1] Documentation/gpu/drm-uapi.rst v2: Add uapi reference and fixes tag (Matthew Brost) Fixes: 7bc00751f877 ("drm/xe: Use device wedged event") Signed-off-by: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260212055622.2054991-1-raag.jadav@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit c020fff70d757612933711dd3cc3751d7d782d3c) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-02-18drm/xe: Make xe_modparam.force_vram_bar_size signedShuicheng Lin1-1/+1
vram_bar_size is registered as an int module parameter and is documented to accept negative values to disable BAR resizing. Store it as an int in xe_modparam as well, so negative values work as intended and the module_param type matches. Fixes: 80742a1aa26e ("drm/xe: Allow to drop vram resizing") Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://patch.msgid.link/20260202181853.1095736-2-shuicheng.lin@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit 25c9aa4dcb5ef2ad9f354d19f8f1eeb690d1c161) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-02-18drm/xe/vf: Avoid reading media version when media GT is disabledPiotr Piórkowski1-0/+6
When the media GT is not allowed, a VF must not attempt to read the media version from the GuC. The GuC may not be loaded, and any attempt to communicate with it would result in a timeout and a VF probe failure: (...) [ 1912.406046] xe 0000:01:00.1: [drm] *ERROR* Tile0: GT1: GuC mmio request 0x5507: no reply 0x5507 [ 1912.407277] xe 0000:01:00.1: [drm] *ERROR* Tile0: GT1: [GUC COMMUNICATION] MMIO send failed (-ETIMEDOUT) [ 1912.408689] xe 0000:01:00.1: [drm] *ERROR* VF: Tile0: GT1: Failed to reset GuC state (-ETIMEDOUT) [ 1912.413986] xe 0000:01:00.1: probe with driver xe failed with error -110 Let's skip reading the media version for VFs when the media GT is not allowed. v2: move the condition directly to the VF path Fixes: 7abd69278bb5 ("drm/xe/configfs: Add attribute to disable GT types") Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://patch.msgid.link/20260202115041.2863357-1-piotr.piorkowski@intel.com Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> (cherry picked from commit 0bcacf56dc0b265f9c47056c6a4f0c1394a8a3f0) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-02-18drm/xe/xe2_hpg: Fix handling of Wa_14019988906 & Wa_14019877138Matt Roper1-10/+8
The PSS_CHICKEN register has been part of the RCS engine's LRC since it was first introduced in Xe_LP. That means that any workarounds that adjust its value (such as Wa_14019988906 and Wa_14019877138) need to be implemented in the lrc_was[] table so that they become part of the default LRC from which all subsequent LRCs are copied. Although these workarounds were implemented correctly on most platforms, they were incorrectly placed on the engine_was[] table for Xe2_HPG. Move the workarounds to the proper lrc_was[] table and switch the 'xe_rtp_match_first_render_or_compute' rule to specifically match the RCS since that's the engine whose LRC manages the register. Bspec: 65182 Fixes: 7f3ee7d88058 ("drm/xe/xe2hpg: Add initial GT workarounds") Reviewed-by: Shekhar Chauhan <shekhar.chauhan@intel.com> Link: https://patch.msgid.link/20260205220508.51905-2-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit e04c609eedf4d6748ac0bcada4de1275b034fed6) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-02-18drm/xe/mmio: Avoid double-adjust in 64-bit readsShuicheng Lin1-5/+5
xe_mmio_read64_2x32() was adjusting register addresses and then calling xe_mmio_read32(), which applies the adjustment again. This may shift accesses twice if adj_offset < adj_limit. There is no issue currently, as for media gt, adj_offset > adj_limit, so the 2nd adjust will be a no-op. But it may not work in future. To fix it, replace the adjusted-address comparison with a direct sanity check that ensures the MMIO address adjustment cutoff never falls within the 8-byte range of a 64-bit register. And let xe_mmio_read32() handle address translation. v2: rewrite the sanity check in a more natural way. (Matt) v3: Add Fixes tag. (Jani) Fixes: 07431945d8ae ("drm/xe: Avoid 64-bit register reads") Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://patch.msgid.link/20260130165621.471408-2-shuicheng.lin@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit a30f999681126b128a43137793ac84b6a5b7443f) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-02-18drm/xe: Add bounds check on pat_index to prevent OOB kernel read in madviseJia Yao1-1/+6
When user provides a bogus pat_index value through the madvise IOCTL, the xe_pat_index_get_coh_mode() function performs an array access without validating bounds. This allows a malicious user to trigger an out-of-bounds kernel read from the xe->pat.table array. The vulnerability exists because the validation in madvise_args_are_sane() directly calls xe_pat_index_get_coh_mode(xe, args->pat_index.val) without first checking if pat_index is within [0, xe->pat.n_entries). Although xe_pat_index_get_coh_mode() has a WARN_ON to catch this in debug builds, it still performs the unsafe array access in production kernels. v2(Matthew Auld) - Using array_index_nospec() to mitigate spectre attacks when the value is used v3(Matthew Auld) - Put the declarations at the start of the block Fixes: ada7486c5668 ("drm/xe: Implement madvise ioctl for xe") Reviewed-by: Matthew Auld <matthew.auld@intel.com> Cc: <stable@vger.kernel.org> # v6.18+ Cc: Matthew Brost <matthew.brost@intel.com> Cc: Shuicheng Lin <shuicheng.lin@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Jia Yao <jia.yao@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patch.msgid.link/20260205161529.1819276-1-jia.yao@intel.com (cherry picked from commit 944a3329b05510d55c69c2ef455136e2fc02de29) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-02-18drm/xe/configfs: Fix 'parameter name omitted' errorsMichal Wajdeczko1-4/+8
On some configs and old compilers we can get following build errors: ../drivers/gpu/drm/xe/xe_configfs.h: In function 'xe_configfs_get_ctx_restore_mid_bb': ../drivers/gpu/drm/xe/xe_configfs.h:40:76: error: parameter name omitted static inline u32 xe_configfs_get_ctx_restore_mid_bb(struct pci_dev *pdev, enum xe_engine_class, ^~~~~~~~~~~~~~~~~~~~ ../drivers/gpu/drm/xe/xe_configfs.h: In function 'xe_configfs_get_ctx_restore_post_bb': ../drivers/gpu/drm/xe/xe_configfs.h:42:77: error: parameter name omitted static inline u32 xe_configfs_get_ctx_restore_post_bb(struct pci_dev *pdev, enum xe_engine_class, ^~~~~~~~~~~~~~~~~~~~ when trying to define our configfs stub functions. Fix that. Fixes: 7a4756b2fd04 ("drm/xe/lrc: Allow to add user commands mid context switch") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://patch.msgid.link/20260203193745.576-1-michal.wajdeczko@intel.com (cherry picked from commit f59cde8a2452b392115d2af8f1143a94725f4827) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-02-18drm/xe/pf: Fix sysfs initializationMichal Wajdeczko1-28/+26
In case of devm_add_action_or_reset() failure the provided cleanup action will be run immediately on the not yet initialized kobject. This may lead to errors like: [ ] kobject: '(null)' (ff110001393608e0): is not initialized, yet kobject_put() is being called. [ ] WARNING: lib/kobject.c:734 at kobject_put+0xd9/0x250, CPU#0: kworker/0:0/9 [ ] RIP: 0010:kobject_put+0xdf/0x250 [ ] Call Trace: [ ] xe_sriov_pf_sysfs_init+0x21/0x100 [xe] [ ] xe_sriov_pf_init_late+0x87/0x2b0 [xe] [ ] xe_sriov_init_late+0x5f/0x2c0 [xe] [ ] xe_device_probe+0x5f2/0xc20 [xe] [ ] xe_pci_probe+0x396/0x610 [xe] [ ] local_pci_probe+0x47/0xb0 [ ] refcount_t: underflow; use-after-free. [ ] WARNING: lib/refcount.c:28 at refcount_warn_saturate+0x68/0xb0, CPU#0: kworker/0:0/9 [ ] RIP: 0010:refcount_warn_saturate+0x68/0xb0 [ ] Call Trace: [ ] kobject_put+0x174/0x250 [ ] xe_sriov_pf_sysfs_init+0x21/0x100 [xe] [ ] xe_sriov_pf_init_late+0x87/0x2b0 [xe] [ ] xe_sriov_init_late+0x5f/0x2c0 [xe] [ ] xe_device_probe+0x5f2/0xc20 [xe] [ ] xe_pci_probe+0x396/0x610 [xe] [ ] local_pci_probe+0x47/0xb0 Fix that by calling kobject_init() and kobject_add() separately and register cleanup action after the kobject is initialized. Also make this cleanup registration a part of the create helper to fix another mistake, as in the loop we were wrongly passing parent kobject while registering cleanup action, and this resulted in some undetected leaks. Fixes: 5c170a4d9c53 ("drm/xe/pf: Prepare sysfs for SR-IOV admin attributes") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://patch.msgid.link/20260203235332.1350-1-michal.wajdeczko@intel.com (cherry picked from commit 98b16727f07e26a5d4de84d88805ce7ffcfdd324) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-02-06Merge tag 'drm-xe-next-fixes-2026-02-05' of ↵Dave Airlie8-11/+20
https://gitlab.freedesktop.org/drm/xe/kernel into drm-next - Fix CFI violation in debugfs access (Daniele) - Kernel-doc fixes (Chaitanya, Shuicheng) - Disable D3Cold for BMG only on specific platforms (Karthik) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patch.msgid.link/aYStaLZVJWwKCDZt@intel.com
2026-02-06Merge tag 'drm-misc-next-fixes-2026-02-05' of ↵Dave Airlie13-80/+103
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next Several fixes for amdxdna around PM handling, error reporting and memory safety, a compilation fix for ilitek-ili9882t, a NULL pointer dereference fix for imx8qxp-pixel-combiner and several PTE fixes for nouveau Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <mripard@redhat.com> Link: https://patch.msgid.link/20260205-refreshing-natural-vole-4c73af@houat
2026-02-06Merge tag 'drm-intel-next-fixes-2026-02-05' of ↵Dave Airlie4-24/+26
https://gitlab.freedesktop.org/drm/i915/kernel into drm-next - Fix the pixel normalization handling for xe3p_lpd display Signed-off-by: Dave Airlie <airlied@redhat.com> From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patch.msgid.link/aYROngKfyUIyoQW0@jlahtine-mobl
2026-02-05drm/xe/pm: Disable D3Cold for BMG only on specific platformsKarthik Poosa1-3/+10
Restrict D3Cold disablement for BMG to unsupported NUC platforms, instead of disabling it on all platforms. Signed-off-by: Karthik Poosa <karthik.poosa@intel.com> Fixes: 3e331a6715ee ("drm/xe/pm: Temporarily disable D3Cold on BMG") Link: https://patch.msgid.link/20260123173238.1642383-1-karthik.poosa@intel.com Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit 39125eaf8863ab09d70c4b493f58639b08d5a897) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-02-05drm/xe: Fix kerneldoc for xe_tlb_inval_job_alloc_depShuicheng Lin1-1/+1
Correct the function name in the kerneldoc. It is for below warning: "Warning: drivers/gpu/drm/xe/xe_tlb_inval_job.c:210 expecting prototype for xe_tlb_inval_alloc_dep(). Prototype was for xe_tlb_inval_job_alloc_dep() instead" Fixes: 15366239e2130 ("drm/xe: Decouple TLB invalidations from GT") Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/20260129233834.419977-8-shuicheng.lin@intel.com (cherry picked from commit 9f9c117ac566cb567dd56cc5b7564c45653f7a2a) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-02-05drm/xe: Fix kerneldoc for xe_gt_tlb_inval_init_earlyShuicheng Lin1-1/+1
Correct the function name in the kerneldoc. It is for below warning: "Warning: drivers/gpu/drm/xe/xe_tlb_inval.c:136 expecting prototype for xe_gt_tlb_inval_init(). Prototype was for xe_gt_tlb_inval_init_early() instead" v2: add () for the function. (Michal) Fixes: db16f9d90c1d9 ("drm/xe: Split TLB invalidation code in frontend and backend") Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/20260129233834.419977-7-shuicheng.lin@intel.com (cherry picked from commit 0651dbb9d6a72e99569576fbec4681fd8160d161) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-02-05drm/xe: Fix kerneldoc for xe_migrate_exec_queueShuicheng Lin1-1/+1
Correct the function name in the kerneldoc. It is for below warning: "Warning: drivers/gpu/drm/xe/xe_migrate.c:1262 expecting prototype for xe_get_migrate_exec_queue(). Prototype was for xe_migrate_exec_queue() instead" Fixes: 916ee4704a865 ("drm/xe/vf: Register CCS read/write contexts with Guc") Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/20260129233834.419977-6-shuicheng.lin@intel.com (cherry picked from commit 9fd8da717934f05125b9ba6782622c459a368dc0) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-02-05drm/xe/query: Fix topology query pointer advanceShuicheng Lin1-1/+1
The topology query helper advanced the user pointer by the size of the pointer, not the size of the structure. This can misalign the output blob and corrupt the following mask. Fix the increment to use sizeof(*topo). There is no issue currently, as sizeof(*topo) happens to be equal to sizeof(topo) on 64-bit systems (both evaluate to 8 bytes). Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patch.msgid.link/20260130043907.465128-2-shuicheng.lin@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit c2a6859138e7f73ad904be17dd7d1da6cc7f06b3) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-02-05drm/xe/guc: Fix kernel-doc warning in GuC scheduler ABI headerChaitanya Kumar Borah1-1/+1
The GuC scheduler ABI header contains a file-level comment that is not intended to document a kernel-doc symbol. Using kernel-doc comment syntax (/** */) triggers kernel-doc warnings. With "-Werror", this causes the build to fail. Convert the comment to a regular block comment. HDRTEST drivers/gpu/drm/xe/abi/guc_scheduler_abi.h Warning: drivers/gpu/drm/xe/abi/guc_scheduler_abi.h:11 This comment starts with '/**', but isn't a kernel-doc comment. Refer to Documentation/doc-guide/kernel-doc.rst * Generic defines required for registration with and submissions to the GuC 1 warnings as errors make[6]: *** [drivers/gpu/drm/xe/Makefile:377: drivers/gpu/drm/xe/abi/guc_scheduler_abi.hdrtest] Error 3 make[5]: *** [scripts/Makefile.build:544: drivers/gpu/drm/xe] Error 2 make[4]: *** [scripts/Makefile.build:544: drivers/gpu/drm] Error 2 make[3]: *** [scripts/Makefile.build:544: drivers/gpu] Error 2 make[2]: *** [scripts/Makefile.build:544: drivers] Error 2 make[1]: *** [/home/kbuild2/kernel/Makefile:2088: .] Error 2 make: *** [Makefile:248: __sub-make] Error 2 v2: - Add Fixes tag (Daniele) Fixes: b0c5cf4f5917 ("drm/gt/guc: extract scheduler-related defines from guc_fwif.h") Signed-off-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com> Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patch.msgid.link/20260130135210.2659200-1-chaitanya.kumar.borah@intel.com (cherry picked from commit f89dbe14a0c8854b7aaf960dd842c10698b3ff19) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-02-05drm/xe/guc: Fix CFI violation in debugfs access.Daniele Ceraolo Spurio2-3/+5
xe_guc_print_info is void-returning, but the function pointer it is assigned to expects an int-returning function, leading to the following CFI error: [ 206.873690] CFI failure at guc_debugfs_show+0xa1/0xf0 [xe] (target: xe_guc_print_info+0x0/0x370 [xe]; expected type: 0xbe3bc66a) Fix this by updating xe_guc_print_info to return an integer. Fixes: e15826bb3c2c ("drm/xe/guc: Refactor GuC debugfs initialization") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: George D Sworo <george.d.sworo@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/20260129182547.32899-2-daniele.ceraolospurio@intel.com (cherry picked from commit dd8ea2f2ab71b98887fdc426b0651dbb1d1ea760) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-02-05accel/amdxdna: Move RPM resume into job run functionLizhi Hou1-10/+9
Currently, amdxdna_pm_resume_get() is called during job creation, and amdxdna_pm_suspend_put() is called when the hardware notifies job completion. If a job is canceled before it is run, no hardware completion notification is generated, resulting in an unbalanced runtime PM resume/suspend pair. Fix this by moving amdxdna_pm_resume_get() to the job run path, ensuring runtime PM is only resumed for jobs that are actually executed. Fixes: 063db451832b ("accel/amdxdna: Enhance runtime power management") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260204171118.3165607-1-lizhi.hou@amd.com
2026-02-05accel/amdxdna: Fix incorrect DPM level after suspend/resumeLizhi Hou2-2/+3
The suspend routine sets the DPM level to 0, which unintentionally overwrites the previously saved DPM level. As a result, the device always resumes with DPM level 0 instead of restoring the original value. Fix this by ensuring the suspend path does not overwrite the saved DPM level, allowing the correct DPM level to be restored during resume. Fixes: f4d7b8a6bc8c ("accel/amdxdna: Enhance power management settings") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260204171048.3165580-1-lizhi.hou@amd.com
2026-02-04nouveau/vmm: start tracking if the LPT PTE is valid. (v6)Dave Airlie3-10/+36
When NVK enabled large pages userspace tests were seeing fault reports at a valid address. There was a case where an address moving from 64k page to 4k pages could expose a race between unmapping the 4k page, mapping the 64k page and unref the 4k pages. Unref 4k pages would cause the dual-page table handling to always set the LPTE entry to SPARSE or INVALID, but if we'd mapped a valid LPTE in the meantime, it would get trashed. Keep track of when a valid LPTE has been referenced, and don't reset in that case. This adds an lpte valid tracker and lpte reference count. Whenever an lpte is referenced, it gets made valid and the ref count increases, whenever it gets unreference the refcount is tracked. Link: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14610 Reviewed-by: Mary Guillemard <mary@mary.zone> Tested-by: Mary Guillemard <mary@mary.zone> Tested-by: Mel Henning <mhenning@darkrefraction.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patch.msgid.link/20260204030208.2313241-4-airlied@gmail.com
2026-02-04nouveau/vmm: increase size of vmm pte tracker struct to u32 (v2)Dave Airlie2-7/+8
We need to tracker large counts of spte than previously due to unref getting delayed sometimes. This doesn't fix LPT tracking yet, it just creates space for it. Reviewed-by: Mary Guillemard <mary@mary.zone> Tested-by: Mary Guillemard <mary@mary.zone> Tested-by: Mel Henning <mhenning@darkrefraction.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patch.msgid.link/20260204030208.2313241-3-airlied@gmail.com
2026-02-04nouveau/vmm: rewrite pte tracker using a struct and bitfields.Dave Airlie2-24/+31
I want to increase the counters here and start tracking LPTs as well as there are certain situations where userspace with mixed page sizes can cause ref/unrefs to live longer so need better reference counting. This should be entirely non-functional. Reviewed-by: Mary Guillemard <mary@mary.zone> Tested-by: Mary Guillemard <mary@mary.zone> Tested-by: Mel Henning <mhenning@darkrefraction.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patch.msgid.link/20260204030208.2313241-2-airlied@gmail.com
2026-02-03accel/amdxdna: Fix incorrect error code returned for failed chain commandLizhi Hou1-1/+1
The driver currently returns an incorrect error code when a chain command fails. In this case, ERT_CMD_STATE_ERROR is expected to be reported for failed chain commands. Fixes: aac243092b70 ("accel/amdxdna: Add command execution") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Maciej Falkowski <maciej.falkowski@linux.intel.com> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260203184037.2751889-1-lizhi.hou@amd.com
2026-02-03accel/amdxdna: Remove hardware context statusLizhi Hou3-28/+5
One newly supported command does not require hardware context configuration to be performed upfront. As a result, checking hardware context status causes this command to fail incorrectly. Remove hardware context status handling entirely. For other commands, if userspace submits a request without configuring the hardware context first, the firmware will report an error or time out as appropriate. Fixes: aac243092b70 ("accel/amdxdna: Add command execution") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260202212450.2681273-1-lizhi.hou@amd.com
2026-02-03drm/bridge: imx8qxp-pixel-combiner: Fix bailout for imx8qxp_pc_bridge_probe()Liu Ying1-1/+1
In case the channel0 is unavailable and bailing out from free_child is needed when we fail to add a DRM bridge for the available channel1, pointer pc->ch[0] in the bailout path would be NULL and it would be dereferenced as pc->ch[0]->bridge.next_bridge. Fix this by checking pc->ch[0] before dereferencing it. Fixes: ae754f049ce1 ("drm/bridge: imx8qxp-pixel-combiner: get/put the next bridge") Fixes: 99764593528f ("drm/bridge: imx8qxp-pixel-combiner: convert to devm_drm_bridge_alloc() API") Signed-off-by: Liu Ying <victor.liu@nxp.com> Reviewed-by: Luca Ceresoli <luca.ceresoli@bootlin.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20260123-imx8qxp-drm-bridge-fixes-v1-3-8bb85ada5866@nxp.com Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
2026-02-03drm/panel: ilitek-ili9882t: Remove duplicate initializers in tianma_il79900a_dscNathan Chancellor1-4/+0
Clang warns (or errors with CONFIG_WERROR=y / W=e): drivers/gpu/drm/panel/panel-ilitek-ili9882t.c:95:16: error: initializer overrides prior initialization of this subobject [-Werror,-Winitializer-overrides] 95 | .vbr_enable = 0, | ^ drivers/gpu/drm/panel/panel-ilitek-ili9882t.c:90:16: note: previous initialization is here 90 | .vbr_enable = false, | ^~~~~ drivers/gpu/drm/panel/panel-ilitek-ili9882t.c:97:19: error: initializer overrides prior initialization of this subobject [-Werror,-Winitializer-overrides] 97 | .rc_model_size = DSC_RC_MODEL_SIZE_CONST, | ^~~~~~~~~~~~~~~~~~~~~~~ include/drm/display/drm_dsc.h:22:38: note: expanded from macro 'DSC_RC_MODEL_SIZE_CONST' 22 | #define DSC_RC_MODEL_SIZE_CONST 8192 | ^~~~ drivers/gpu/drm/panel/panel-ilitek-ili9882t.c:91:19: note: previous initialization is here 91 | .rc_model_size = DSC_RC_MODEL_SIZE_CONST, | ^~~~~~~~~~~~~~~~~~~~~~~ include/drm/display/drm_dsc.h:22:38: note: expanded from macro 'DSC_RC_MODEL_SIZE_CONST' 22 | #define DSC_RC_MODEL_SIZE_CONST 8192 | ^~~~ drivers/gpu/drm/panel/panel-ilitek-ili9882t.c:132:25: error: initializer overrides prior initialization of this subobject [-Werror,-Winitializer-overrides] 132 | .initial_scale_value = 32, | ^~ drivers/gpu/drm/panel/panel-ilitek-ili9882t.c:126:25: note: previous initialization is here 126 | .initial_scale_value = 32, | ^~ drivers/gpu/drm/panel/panel-ilitek-ili9882t.c:133:20: error: initializer overrides prior initialization of this subobject [-Werror,-Winitializer-overrides] 133 | .nfl_bpg_offset = 3511, | ^~~~ drivers/gpu/drm/panel/panel-ilitek-ili9882t.c:108:20: note: previous initialization is here 108 | .nfl_bpg_offset = 1402, | ^~~~ GCC would warn about this in the same manner but its version, -Woverride-init, is disabled for a normal kernel build in scripts/Makefile.warn. For clang, -Wextra in drivers/gpu/drm/Makefile turns it back but GCC respects turning it off earlier in the command line. Of all the duplicate fields in the initializer, only nfl_bpg_offset is a different value. Clear up the duplicate initializers, keeping the 'false' value for .vbr_enable, as it is bool, and the second value for .nfl_bpg_offset, assuming it is the correct one since it was the one tested in the original change. Fixes: 65ce1f5834e9 ("drm/panel: ilitek-ili9882t: Switch Tianma TL121BVMS07 to DSC 120Hz mode") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Link: https://patch.msgid.link/20260114-panel-ilitek-ili9882t-fix-override-init-v1-1-1d69a2b096df@kernel.org Signed-off-by: Maxime Ripard <mripard@kernel.org>
2026-02-02drm/i915/display: fix the pixel normalization handling for xe3p_lpdVinod Govindapillai4-24/+26
Pixel normalizer is enabled with normalization factor as 1.0 for FP16 formats in order to support FBC for those formats in xe3p_lpd. Previously pixel normalizer gets disabled during the plane disable routine. But there could be plane format settings without explicitly calling the plane disable in-between and we could endup keeping the pixel normalizer enabled for formats which we don't require that. This is causing crc mismatches in yuv formats and FIFO underruns in planar formats like NV12. Fix this by updating the pixel normalizer configuration based on the pixel formats explicitly during the plane settings arm calls itself - enable it for FP16 and disable it for other formats in HDR capable planes. v2: avoid redundant pixel normalization setting updates v3: moved the normalization factor definition to intel_fbc.c and some updates to comments v4: simplified the pixel normalizer setting handling Fixes: 5298eea7ed20 ("drm/i915/xe3p_lpd: use pixel normalizer for fp16 formats for FBC") Signed-off-by: Vinod Govindapillai <vinod.govindapillai@intel.com> Reviewed-by: Uma Shankar <uma.shankar@intel.com> Link: https://patch.msgid.link/20260130095919.107805-1-vinod.govindapillai@intel.com (cherry picked from commit c0dc68f4e2aa7eddb9ec6d95931f9576d8fe7334) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2026-02-02Merge tag 'exynos-drm-next-for-v6.20' of ↵Dave Airlie2-11/+64
git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-next Fix three regressions . Fix a regression where vidi_connection_ioctl() used the wrong device to look up the vidi context. It stores the vidi device in exynos_drm_private and uses it in ioctl(), preventing invalid pointer access and related bugs. . Fix a security regression where vidi_connection_ioctl() directly dereferenced a user pointer for EDID data. It copies EDID from user space with copy_from_user() into kernel memory before use, preventing arbitrary kernel memory access. . Fix a concurrency regression where vidi_context members related to EDID memory were accessed without locking. It protects alloc/free and state updates with ctx->lock, preventing race conditions and use-after-free bugs. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Inki Dae <inki.dae@samsung.com> Link: https://patch.msgid.link/20260201143939.27074-1-inki.dae@samsung.com
2026-02-01Merge tag 'amd-drm-next-6.20-2026-01-30' of ↵Dave Airlie75-576/+743
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.20-2026-01-30: amdgpu: - Misc cleanups - SMU 13 fixes - SMU 14 fixes - GPUVM fault filter fix - USB4 fixes - DC FP guard fixes - Powergating fix - JPEG ring reset fix - RAS fixes - Xclk fix for soc21 APUs - Fix COND_EXEC handling for GC 11 - UserQ fixes - MQD size alignment fixes - SMU feature interface cleanup - GC 10-12 KGQ init fixes - GC 11-12 KGQ reset fixes amdkfd: - Fix device snapshot reporting - GC 12.1 trap handler fixes - MQD size alignment fixes Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patch.msgid.link/20260130183257.28879-1-alexander.deucher@amd.com
2026-02-01drm/exynos: vidi: use ctx->lock to protect struct vidi_context member ↵Jeongjun Park1-6/+32
variables related to memory alloc/free Exynos Virtual Display driver performs memory alloc/free operations without lock protection, which easily causes concurrency problem. For example, use-after-free can occur in race scenario like this: ``` CPU0 CPU1 CPU2 ---- ---- ---- vidi_connection_ioctl() if (vidi->connection) // true drm_edid = drm_edid_alloc(); // alloc drm_edid ... ctx->raw_edid = drm_edid; ... drm_mode_getconnector() drm_helper_probe_single_connector_modes() vidi_get_modes() if (ctx->raw_edid) // true drm_edid_dup(ctx->raw_edid); if (!drm_edid) // false ... vidi_connection_ioctl() if (vidi->connection) // false drm_edid_free(ctx->raw_edid); // free drm_edid ... drm_edid_alloc(drm_edid->edid) kmemdup(edid); // UAF!! ... ``` To prevent these vulns, at least in vidi_context, member variables related to memory alloc/free should be protected with ctx->lock. Cc: <stable@vger.kernel.org> Signed-off-by: Jeongjun Park <aha310510@gmail.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2026-02-01drm/exynos: vidi: fix to avoid directly dereferencing user pointerJeongjun Park1-4/+18
In vidi_connection_ioctl(), vidi->edid(user pointer) is directly dereferenced in the kernel. This allows arbitrary kernel memory access from the user space, so instead of directly accessing the user pointer in the kernel, we should modify it to copy edid to kernel memory using copy_from_user() and use it. Cc: <stable@vger.kernel.org> Signed-off-by: Jeongjun Park <aha310510@gmail.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2026-02-01drm/exynos: vidi: use priv->vidi_dev for ctx lookup in vidi_connection_ioctl()Jeongjun Park2-1/+14
vidi_connection_ioctl() retrieves the driver_data from drm_dev->dev to obtain a struct vidi_context pointer. However, drm_dev->dev is the exynos-drm master device, and the driver_data contained therein is not the vidi component device, but a completely different device. This can lead to various bugs, ranging from null pointer dereferences and garbage value accesses to, in unlucky cases, out-of-bounds errors, use-after-free errors, and more. To resolve this issue, we need to store/delete the vidi device pointer in exynos_drm_private->vidi_dev during bind/unbind, and then read this exynos_drm_private->vidi_dev within ioctl() to obtain the correct struct vidi_context pointer. Cc: <stable@vger.kernel.org> Signed-off-by: Jeongjun Park <aha310510@gmail.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2026-01-30accel/amdxdna: Fix memory leak in amdxdna_ubuf_mapZishun Yi1-2/+8
The amdxdna_ubuf_map() function allocates memory for sg and internal sg table structures, but it fails to free them if subsequent operations (sg_alloc_table_from_pages or dma_map_sgtable) fail. Fixes: bd72d4acda10 ("accel/amdxdna: Support user space allocated buffer") Signed-off-by: Zishun Yi <zishun.yi.dev@gmail.com> Reviewed-by: Lizhi Hou <lizhi.hou@amd.com> Reviewed-by: Min Ma <mamin506@gmail.com> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260129171022.68578-1-zishun.yi.dev@gmail.com
2026-01-30accel/amdxdna: Stop job scheduling across aie2_release_resource()Lizhi Hou1-0/+6
Running jobs on a hardware context while it is in the process of releasing resources can lead to use-after-free and crashes. Fix this by stopping job scheduling before calling aie2_release_resource() and restarting it after the release completes. Additionally, aie2_sched_job_run() now checks whether the hardware context is still active. Fixes: 4fd6ca90fc7f ("accel/amdxdna: Refactor hardware context destroy routine") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260130003255.2083255-1-lizhi.hou@amd.com
2026-01-30accel/amdxdna: Hold mm structure across iommu_sva_unbind_device()Lizhi Hou2-0/+4
Some tests trigger a crash in iommu_sva_unbind_device() due to accessing iommu_mm after the associated mm structure has been freed. Fix this by taking an explicit reference to the mm structure after successfully binding the device, and releasing it only after the device is unbound. This ensures the mm remains valid for the entire SVA bind/unbind lifetime. Fixes: be462c97b7df ("accel/amdxdna: Add hardware context") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260128002356.1858122-1-lizhi.hou@amd.com
2026-01-30Merge tag 'drm-xe-next-fixes-2026-01-29' of ↵Dave Airlie3-3/+43
https://gitlab.freedesktop.org/drm/xe/kernel into drm-next - Reduce LRC timestamp stuck message on VFs to notice (Brost) - Disable GuC Power DCC strategy on PTL (Vinay) - Unregister drm device on probe error (Lin) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patch.msgid.link/aXuyrtsnlAOmj_OB@intel.com
2026-01-30Merge tag 'drm-misc-next-fixes-2026-01-29' of ↵Dave Airlie3-5/+14
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next Two fixes for NULL pointer dereference in imx8 following the bridge refcounting conversions, and one for the bridge connector following the HDMI audio reworks. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <mripard@redhat.com> Link: https://patch.msgid.link/20260129-efficient-jerboa-of-ecstasy-822832@houat
2026-01-30Merge tag 'drm-intel-next-fixes-2026-01-29' of ↵Dave Airlie1-8/+21
https://gitlab.freedesktop.org/drm/i915/kernel into drm-next - Prevent u64 underflow in intel_fbc_stolen_end Signed-off-by: Dave Airlie <airlied@redhat.com> From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patch.msgid.link/aXsWGWjacEJ03rTs@jlahtine-mobl
2026-01-29drm/amdgpu/gfx12: adjust KGQ reset sequenceAlex Deucher1-10/+13
Kernel gfx queues do not need to be reinitialized or remapped after a reset. Align with gfx11. v2: preserve init and remap for MMIO case. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-29drm/amdgpu/gfx11: adjust KGQ reset sequenceAlex Deucher1-10/+13
Kernel gfx queues do not need to be reinitialized or remapped after a reset. This fixes queue reset failures on APUs. v2: preserve init and remap for MMIO case. Fixes: b3e9bfd86658 ("drm/amdgpu/gfx11: add ring reset callbacks") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4789 Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-29drm/amdgpu/gfx12: fix wptr reset in KGQ initAlex Deucher1-1/+1
wptr is a 64 bit value and we need to update the full value, not just 32 bits. Align with what we already do for KCQs. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-29drm/amdgpu/gfx11: fix wptr reset in KGQ initAlex Deucher1-1/+1
wptr is a 64 bit value and we need to update the full value, not just 32 bits. Align with what we already do for KCQs. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-29drm/amdgpu/gfx10: fix wptr reset in KGQ initAlex Deucher1-1/+1
wptr is a 64 bit value and we need to update the full value, not just 32 bits. Align with what we already do for KCQs. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-29drm/amdkfd: Use AMDGPU_MQD_SIZE_ALIGN in gfx11+ kfd mqd managerLang Yu4-47/+17
MES is enabled by default from gfx11+, use AMDGPU_MQD_SIZE_ALIGN unconditionally for gfx11+. Signed-off-by: Lang Yu <lang.yu@amd.com> Reviewed-by: David Belanger <david.belanger@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-29drm/amdkfd: Adjust parameter of allocate_mqdLang Yu11-15/+24
Make allocate_mqd consistent with other callbacks. Prepare for next patch to use mqd_manager->mqd_size. Signed-off-by: Lang Yu <lang.yu@amd.com> Reviewed-by: David Belanger <david.belanger@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-29drm/amdgpu: Use AMDGPU_MQD_SIZE_ALIGN in KGDLang Yu3-10/+13
Use AMDGPU_MQD_SIZE_ALIGN for both kernel and user queue. Signed-off-by: Lang Yu <lang.yu@amd.com> Reviewed-by: David Belanger <david.belanger@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-29drm/amd/pm: Initialize allowed feature listLijo Lazar11-226/+158
Instead of returning feature bit mask of allowed features, initialize the allowed features in the callback implementation itself. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-29drm/amd/pm: Remove unused logic in SMUv14.0.2Lijo Lazar1-39/+0
Remove commented and redundant logic in get_allowed_feature_mask implementation. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>