| Age | Commit message (Collapse) | Author | Files | Lines |
|
When user provides a bogus pat_index value through the madvise IOCTL, the
xe_pat_index_get_coh_mode() function performs an array access without
validating bounds. This allows a malicious user to trigger an out-of-bounds
kernel read from the xe->pat.table array.
The vulnerability exists because the validation in madvise_args_are_sane()
directly calls xe_pat_index_get_coh_mode(xe, args->pat_index.val) without
first checking if pat_index is within [0, xe->pat.n_entries).
Although xe_pat_index_get_coh_mode() has a WARN_ON to catch this in debug
builds, it still performs the unsafe array access in production kernels.
v2(Matthew Auld)
- Using array_index_nospec() to mitigate spectre attacks when the value
is used
v3(Matthew Auld)
- Put the declarations at the start of the block
Fixes: ada7486c5668 ("drm/xe: Implement madvise ioctl for xe")
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Cc: <stable@vger.kernel.org> # v6.18+
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Shuicheng Lin <shuicheng.lin@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Jia Yao <jia.yao@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patch.msgid.link/20260205161529.1819276-1-jia.yao@intel.com
(cherry picked from commit 944a3329b05510d55c69c2ef455136e2fc02de29)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
On some configs and old compilers we can get following build errors:
../drivers/gpu/drm/xe/xe_configfs.h: In function 'xe_configfs_get_ctx_restore_mid_bb':
../drivers/gpu/drm/xe/xe_configfs.h:40:76: error: parameter name omitted
static inline u32 xe_configfs_get_ctx_restore_mid_bb(struct pci_dev *pdev, enum xe_engine_class,
^~~~~~~~~~~~~~~~~~~~
../drivers/gpu/drm/xe/xe_configfs.h: In function 'xe_configfs_get_ctx_restore_post_bb':
../drivers/gpu/drm/xe/xe_configfs.h:42:77: error: parameter name omitted
static inline u32 xe_configfs_get_ctx_restore_post_bb(struct pci_dev *pdev, enum xe_engine_class,
^~~~~~~~~~~~~~~~~~~~
when trying to define our configfs stub functions. Fix that.
Fixes: 7a4756b2fd04 ("drm/xe/lrc: Allow to add user commands mid context switch")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com>
Link: https://patch.msgid.link/20260203193745.576-1-michal.wajdeczko@intel.com
(cherry picked from commit f59cde8a2452b392115d2af8f1143a94725f4827)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
In case of devm_add_action_or_reset() failure the provided cleanup
action will be run immediately on the not yet initialized kobject.
This may lead to errors like:
[ ] kobject: '(null)' (ff110001393608e0): is not initialized, yet kobject_put() is being called.
[ ] WARNING: lib/kobject.c:734 at kobject_put+0xd9/0x250, CPU#0: kworker/0:0/9
[ ] RIP: 0010:kobject_put+0xdf/0x250
[ ] Call Trace:
[ ] xe_sriov_pf_sysfs_init+0x21/0x100 [xe]
[ ] xe_sriov_pf_init_late+0x87/0x2b0 [xe]
[ ] xe_sriov_init_late+0x5f/0x2c0 [xe]
[ ] xe_device_probe+0x5f2/0xc20 [xe]
[ ] xe_pci_probe+0x396/0x610 [xe]
[ ] local_pci_probe+0x47/0xb0
[ ] refcount_t: underflow; use-after-free.
[ ] WARNING: lib/refcount.c:28 at refcount_warn_saturate+0x68/0xb0, CPU#0: kworker/0:0/9
[ ] RIP: 0010:refcount_warn_saturate+0x68/0xb0
[ ] Call Trace:
[ ] kobject_put+0x174/0x250
[ ] xe_sriov_pf_sysfs_init+0x21/0x100 [xe]
[ ] xe_sriov_pf_init_late+0x87/0x2b0 [xe]
[ ] xe_sriov_init_late+0x5f/0x2c0 [xe]
[ ] xe_device_probe+0x5f2/0xc20 [xe]
[ ] xe_pci_probe+0x396/0x610 [xe]
[ ] local_pci_probe+0x47/0xb0
Fix that by calling kobject_init() and kobject_add() separately
and register cleanup action after the kobject is initialized.
Also make this cleanup registration a part of the create helper to
fix another mistake, as in the loop we were wrongly passing parent
kobject while registering cleanup action, and this resulted in some
undetected leaks.
Fixes: 5c170a4d9c53 ("drm/xe/pf: Prepare sysfs for SR-IOV admin attributes")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com>
Link: https://patch.msgid.link/20260203235332.1350-1-michal.wajdeczko@intel.com
(cherry picked from commit 98b16727f07e26a5d4de84d88805ce7ffcfdd324)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
https://gitlab.freedesktop.org/drm/xe/kernel into drm-next
- Fix CFI violation in debugfs access (Daniele)
- Kernel-doc fixes (Chaitanya, Shuicheng)
- Disable D3Cold for BMG only on specific platforms (Karthik)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/aYStaLZVJWwKCDZt@intel.com
|
|
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
Several fixes for amdxdna around PM handling, error reporting and
memory safety, a compilation fix for ilitek-ili9882t, a NULL pointer
dereference fix for imx8qxp-pixel-combiner and several PTE fixes for
nouveau
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <mripard@redhat.com>
Link: https://patch.msgid.link/20260205-refreshing-natural-vole-4c73af@houat
|
|
https://gitlab.freedesktop.org/drm/i915/kernel into drm-next
- Fix the pixel normalization handling for xe3p_lpd display
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patch.msgid.link/aYROngKfyUIyoQW0@jlahtine-mobl
|
|
Restrict D3Cold disablement for BMG to unsupported NUC platforms,
instead of disabling it on all platforms.
Signed-off-by: Karthik Poosa <karthik.poosa@intel.com>
Fixes: 3e331a6715ee ("drm/xe/pm: Temporarily disable D3Cold on BMG")
Link: https://patch.msgid.link/20260123173238.1642383-1-karthik.poosa@intel.com
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 39125eaf8863ab09d70c4b493f58639b08d5a897)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Correct the function name in the kerneldoc.
It is for below warning:
"Warning: drivers/gpu/drm/xe/xe_tlb_inval_job.c:210 expecting prototype for
xe_tlb_inval_alloc_dep(). Prototype was for xe_tlb_inval_job_alloc_dep()
instead"
Fixes: 15366239e2130 ("drm/xe: Decouple TLB invalidations from GT")
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20260129233834.419977-8-shuicheng.lin@intel.com
(cherry picked from commit 9f9c117ac566cb567dd56cc5b7564c45653f7a2a)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Correct the function name in the kerneldoc.
It is for below warning:
"Warning: drivers/gpu/drm/xe/xe_tlb_inval.c:136 expecting prototype for
xe_gt_tlb_inval_init(). Prototype was for xe_gt_tlb_inval_init_early()
instead"
v2: add () for the function. (Michal)
Fixes: db16f9d90c1d9 ("drm/xe: Split TLB invalidation code in frontend and backend")
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20260129233834.419977-7-shuicheng.lin@intel.com
(cherry picked from commit 0651dbb9d6a72e99569576fbec4681fd8160d161)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Correct the function name in the kerneldoc.
It is for below warning:
"Warning: drivers/gpu/drm/xe/xe_migrate.c:1262 expecting prototype for
xe_get_migrate_exec_queue(). Prototype was for xe_migrate_exec_queue()
instead"
Fixes: 916ee4704a865 ("drm/xe/vf: Register CCS read/write contexts with Guc")
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20260129233834.419977-6-shuicheng.lin@intel.com
(cherry picked from commit 9fd8da717934f05125b9ba6782622c459a368dc0)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
The topology query helper advanced the user pointer by the size
of the pointer, not the size of the structure. This can misalign
the output blob and corrupt the following mask. Fix the increment
to use sizeof(*topo).
There is no issue currently, as sizeof(*topo) happens to be equal
to sizeof(topo) on 64-bit systems (both evaluate to 8 bytes).
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patch.msgid.link/20260130043907.465128-2-shuicheng.lin@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
(cherry picked from commit c2a6859138e7f73ad904be17dd7d1da6cc7f06b3)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
The GuC scheduler ABI header contains a file-level comment that is not
intended to document a kernel-doc symbol. Using kernel-doc comment
syntax (/** */) triggers kernel-doc warnings.
With "-Werror", this causes the build to fail. Convert the comment to a
regular block comment.
HDRTEST drivers/gpu/drm/xe/abi/guc_scheduler_abi.h
Warning: drivers/gpu/drm/xe/abi/guc_scheduler_abi.h:11 This comment starts with '/**', but isn't a kernel-doc comment. Refer to Documentation/doc-guide/kernel-doc.rst
* Generic defines required for registration with and submissions to the GuC
1 warnings as errors
make[6]: *** [drivers/gpu/drm/xe/Makefile:377: drivers/gpu/drm/xe/abi/guc_scheduler_abi.hdrtest] Error 3
make[5]: *** [scripts/Makefile.build:544: drivers/gpu/drm/xe] Error 2
make[4]: *** [scripts/Makefile.build:544: drivers/gpu/drm] Error 2
make[3]: *** [scripts/Makefile.build:544: drivers/gpu] Error 2
make[2]: *** [scripts/Makefile.build:544: drivers] Error 2
make[1]: *** [/home/kbuild2/kernel/Makefile:2088: .] Error 2
make: *** [Makefile:248: __sub-make] Error 2
v2:
- Add Fixes tag (Daniele)
Fixes: b0c5cf4f5917 ("drm/gt/guc: extract scheduler-related defines from guc_fwif.h")
Signed-off-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com>
Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patch.msgid.link/20260130135210.2659200-1-chaitanya.kumar.borah@intel.com
(cherry picked from commit f89dbe14a0c8854b7aaf960dd842c10698b3ff19)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
xe_guc_print_info is void-returning, but the function pointer it is
assigned to expects an int-returning function, leading to the following
CFI error:
[ 206.873690] CFI failure at guc_debugfs_show+0xa1/0xf0 [xe]
(target: xe_guc_print_info+0x0/0x370 [xe]; expected type: 0xbe3bc66a)
Fix this by updating xe_guc_print_info to return an integer.
Fixes: e15826bb3c2c ("drm/xe/guc: Refactor GuC debugfs initialization")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: George D Sworo <george.d.sworo@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20260129182547.32899-2-daniele.ceraolospurio@intel.com
(cherry picked from commit dd8ea2f2ab71b98887fdc426b0651dbb1d1ea760)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Currently, amdxdna_pm_resume_get() is called during job creation, and
amdxdna_pm_suspend_put() is called when the hardware notifies job
completion. If a job is canceled before it is run, no hardware
completion notification is generated, resulting in an unbalanced
runtime PM resume/suspend pair.
Fix this by moving amdxdna_pm_resume_get() to the job run path, ensuring
runtime PM is only resumed for jobs that are actually executed.
Fixes: 063db451832b ("accel/amdxdna: Enhance runtime power management")
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20260204171118.3165607-1-lizhi.hou@amd.com
|
|
The suspend routine sets the DPM level to 0, which unintentionally
overwrites the previously saved DPM level. As a result, the device always
resumes with DPM level 0 instead of restoring the original value.
Fix this by ensuring the suspend path does not overwrite the saved DPM
level, allowing the correct DPM level to be restored during resume.
Fixes: f4d7b8a6bc8c ("accel/amdxdna: Enhance power management settings")
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20260204171048.3165580-1-lizhi.hou@amd.com
|
|
When NVK enabled large pages userspace tests were seeing fault
reports at a valid address.
There was a case where an address moving from 64k page to 4k pages
could expose a race between unmapping the 4k page, mapping the 64k
page and unref the 4k pages.
Unref 4k pages would cause the dual-page table handling to always
set the LPTE entry to SPARSE or INVALID, but if we'd mapped a valid
LPTE in the meantime, it would get trashed. Keep track of when
a valid LPTE has been referenced, and don't reset in that case.
This adds an lpte valid tracker and lpte reference count.
Whenever an lpte is referenced, it gets made valid and the ref count
increases, whenever it gets unreference the refcount is tracked.
Link: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14610
Reviewed-by: Mary Guillemard <mary@mary.zone>
Tested-by: Mary Guillemard <mary@mary.zone>
Tested-by: Mel Henning <mhenning@darkrefraction.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patch.msgid.link/20260204030208.2313241-4-airlied@gmail.com
|
|
We need to tracker large counts of spte than previously due to unref
getting delayed sometimes.
This doesn't fix LPT tracking yet, it just creates space for it.
Reviewed-by: Mary Guillemard <mary@mary.zone>
Tested-by: Mary Guillemard <mary@mary.zone>
Tested-by: Mel Henning <mhenning@darkrefraction.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patch.msgid.link/20260204030208.2313241-3-airlied@gmail.com
|
|
I want to increase the counters here and start tracking LPTs as well
as there are certain situations where userspace with mixed page sizes
can cause ref/unrefs to live longer so need better reference counting.
This should be entirely non-functional.
Reviewed-by: Mary Guillemard <mary@mary.zone>
Tested-by: Mary Guillemard <mary@mary.zone>
Tested-by: Mel Henning <mhenning@darkrefraction.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patch.msgid.link/20260204030208.2313241-2-airlied@gmail.com
|
|
The driver currently returns an incorrect error code when a chain command
fails. In this case, ERT_CMD_STATE_ERROR is expected to be reported for
failed chain commands.
Fixes: aac243092b70 ("accel/amdxdna: Add command execution")
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Reviewed-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20260203184037.2751889-1-lizhi.hou@amd.com
|
|
One newly supported command does not require hardware context configuration
to be performed upfront. As a result, checking hardware context status
causes this command to fail incorrectly.
Remove hardware context status handling entirely. For other commands,
if userspace submits a request without configuring the hardware context
first, the firmware will report an error or time out as appropriate.
Fixes: aac243092b70 ("accel/amdxdna: Add command execution")
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20260202212450.2681273-1-lizhi.hou@amd.com
|
|
In case the channel0 is unavailable and bailing out from free_child is
needed when we fail to add a DRM bridge for the available channel1,
pointer pc->ch[0] in the bailout path would be NULL and it would be
dereferenced as pc->ch[0]->bridge.next_bridge. Fix this by checking
pc->ch[0] before dereferencing it.
Fixes: ae754f049ce1 ("drm/bridge: imx8qxp-pixel-combiner: get/put the next bridge")
Fixes: 99764593528f ("drm/bridge: imx8qxp-pixel-combiner: convert to devm_drm_bridge_alloc() API")
Signed-off-by: Liu Ying <victor.liu@nxp.com>
Reviewed-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260123-imx8qxp-drm-bridge-fixes-v1-3-8bb85ada5866@nxp.com
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
|
|
Clang warns (or errors with CONFIG_WERROR=y / W=e):
drivers/gpu/drm/panel/panel-ilitek-ili9882t.c:95:16: error: initializer overrides prior initialization of this subobject [-Werror,-Winitializer-overrides]
95 | .vbr_enable = 0,
| ^
drivers/gpu/drm/panel/panel-ilitek-ili9882t.c:90:16: note: previous initialization is here
90 | .vbr_enable = false,
| ^~~~~
drivers/gpu/drm/panel/panel-ilitek-ili9882t.c:97:19: error: initializer overrides prior initialization of this subobject [-Werror,-Winitializer-overrides]
97 | .rc_model_size = DSC_RC_MODEL_SIZE_CONST,
| ^~~~~~~~~~~~~~~~~~~~~~~
include/drm/display/drm_dsc.h:22:38: note: expanded from macro 'DSC_RC_MODEL_SIZE_CONST'
22 | #define DSC_RC_MODEL_SIZE_CONST 8192
| ^~~~
drivers/gpu/drm/panel/panel-ilitek-ili9882t.c:91:19: note: previous initialization is here
91 | .rc_model_size = DSC_RC_MODEL_SIZE_CONST,
| ^~~~~~~~~~~~~~~~~~~~~~~
include/drm/display/drm_dsc.h:22:38: note: expanded from macro 'DSC_RC_MODEL_SIZE_CONST'
22 | #define DSC_RC_MODEL_SIZE_CONST 8192
| ^~~~
drivers/gpu/drm/panel/panel-ilitek-ili9882t.c:132:25: error: initializer overrides prior initialization of this subobject [-Werror,-Winitializer-overrides]
132 | .initial_scale_value = 32,
| ^~
drivers/gpu/drm/panel/panel-ilitek-ili9882t.c:126:25: note: previous initialization is here
126 | .initial_scale_value = 32,
| ^~
drivers/gpu/drm/panel/panel-ilitek-ili9882t.c:133:20: error: initializer overrides prior initialization of this subobject [-Werror,-Winitializer-overrides]
133 | .nfl_bpg_offset = 3511,
| ^~~~
drivers/gpu/drm/panel/panel-ilitek-ili9882t.c:108:20: note: previous initialization is here
108 | .nfl_bpg_offset = 1402,
| ^~~~
GCC would warn about this in the same manner but its version,
-Woverride-init, is disabled for a normal kernel build in
scripts/Makefile.warn. For clang, -Wextra in drivers/gpu/drm/Makefile
turns it back but GCC respects turning it off earlier in the command
line.
Of all the duplicate fields in the initializer, only nfl_bpg_offset is a
different value. Clear up the duplicate initializers, keeping the
'false' value for .vbr_enable, as it is bool, and the second value for
.nfl_bpg_offset, assuming it is the correct one since it was the one
tested in the original change.
Fixes: 65ce1f5834e9 ("drm/panel: ilitek-ili9882t: Switch Tianma TL121BVMS07 to DSC 120Hz mode")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://patch.msgid.link/20260114-panel-ilitek-ili9882t-fix-override-init-v1-1-1d69a2b096df@kernel.org
Signed-off-by: Maxime Ripard <mripard@kernel.org>
|
|
Pixel normalizer is enabled with normalization factor as 1.0 for
FP16 formats in order to support FBC for those formats in xe3p_lpd.
Previously pixel normalizer gets disabled during the plane disable
routine. But there could be plane format settings without explicitly
calling the plane disable in-between and we could endup keeping the
pixel normalizer enabled for formats which we don't require that.
This is causing crc mismatches in yuv formats and FIFO underruns in
planar formats like NV12. Fix this by updating the pixel normalizer
configuration based on the pixel formats explicitly during the plane
settings arm calls itself - enable it for FP16 and disable it for
other formats in HDR capable planes.
v2: avoid redundant pixel normalization setting updates
v3: moved the normalization factor definition to intel_fbc.c and some
updates to comments
v4: simplified the pixel normalizer setting handling
Fixes: 5298eea7ed20 ("drm/i915/xe3p_lpd: use pixel normalizer for fp16 formats for FBC")
Signed-off-by: Vinod Govindapillai <vinod.govindapillai@intel.com>
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Link: https://patch.msgid.link/20260130095919.107805-1-vinod.govindapillai@intel.com
(cherry picked from commit c0dc68f4e2aa7eddb9ec6d95931f9576d8fe7334)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-next
Fix three regressions
. Fix a regression where vidi_connection_ioctl() used the wrong device
to look up the vidi context. It stores the vidi device in exynos_drm_private
and uses it in ioctl(), preventing invalid pointer access and related bugs.
. Fix a security regression where vidi_connection_ioctl() directly dereferenced
a user pointer for EDID data. It copies EDID from user space
with copy_from_user() into kernel memory before use, preventing arbitrary
kernel memory access.
. Fix a concurrency regression where vidi_context members related
to EDID memory were accessed without locking. It protects alloc/free and
state updates with ctx->lock, preventing race conditions and use-after-free bugs.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Inki Dae <inki.dae@samsung.com>
Link: https://patch.msgid.link/20260201143939.27074-1-inki.dae@samsung.com
|
|
https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-6.20-2026-01-30:
amdgpu:
- Misc cleanups
- SMU 13 fixes
- SMU 14 fixes
- GPUVM fault filter fix
- USB4 fixes
- DC FP guard fixes
- Powergating fix
- JPEG ring reset fix
- RAS fixes
- Xclk fix for soc21 APUs
- Fix COND_EXEC handling for GC 11
- UserQ fixes
- MQD size alignment fixes
- SMU feature interface cleanup
- GC 10-12 KGQ init fixes
- GC 11-12 KGQ reset fixes
amdkfd:
- Fix device snapshot reporting
- GC 12.1 trap handler fixes
- MQD size alignment fixes
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patch.msgid.link/20260130183257.28879-1-alexander.deucher@amd.com
|
|
variables related to memory alloc/free
Exynos Virtual Display driver performs memory alloc/free operations
without lock protection, which easily causes concurrency problem.
For example, use-after-free can occur in race scenario like this:
```
CPU0 CPU1 CPU2
---- ---- ----
vidi_connection_ioctl()
if (vidi->connection) // true
drm_edid = drm_edid_alloc(); // alloc drm_edid
...
ctx->raw_edid = drm_edid;
...
drm_mode_getconnector()
drm_helper_probe_single_connector_modes()
vidi_get_modes()
if (ctx->raw_edid) // true
drm_edid_dup(ctx->raw_edid);
if (!drm_edid) // false
...
vidi_connection_ioctl()
if (vidi->connection) // false
drm_edid_free(ctx->raw_edid); // free drm_edid
...
drm_edid_alloc(drm_edid->edid)
kmemdup(edid); // UAF!!
...
```
To prevent these vulns, at least in vidi_context, member variables related
to memory alloc/free should be protected with ctx->lock.
Cc: <stable@vger.kernel.org>
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
|
|
In vidi_connection_ioctl(), vidi->edid(user pointer) is directly
dereferenced in the kernel.
This allows arbitrary kernel memory access from the user space, so instead
of directly accessing the user pointer in the kernel, we should modify it
to copy edid to kernel memory using copy_from_user() and use it.
Cc: <stable@vger.kernel.org>
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
|
|
vidi_connection_ioctl() retrieves the driver_data from drm_dev->dev to
obtain a struct vidi_context pointer. However, drm_dev->dev is the
exynos-drm master device, and the driver_data contained therein is not
the vidi component device, but a completely different device.
This can lead to various bugs, ranging from null pointer dereferences and
garbage value accesses to, in unlucky cases, out-of-bounds errors,
use-after-free errors, and more.
To resolve this issue, we need to store/delete the vidi device pointer in
exynos_drm_private->vidi_dev during bind/unbind, and then read this
exynos_drm_private->vidi_dev within ioctl() to obtain the correct
struct vidi_context pointer.
Cc: <stable@vger.kernel.org>
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
|
|
The amdxdna_ubuf_map() function allocates memory for sg and
internal sg table structures, but it fails to free them if subsequent
operations (sg_alloc_table_from_pages or dma_map_sgtable) fail.
Fixes: bd72d4acda10 ("accel/amdxdna: Support user space allocated buffer")
Signed-off-by: Zishun Yi <zishun.yi.dev@gmail.com>
Reviewed-by: Lizhi Hou <lizhi.hou@amd.com>
Reviewed-by: Min Ma <mamin506@gmail.com>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20260129171022.68578-1-zishun.yi.dev@gmail.com
|
|
Running jobs on a hardware context while it is in the process of
releasing resources can lead to use-after-free and crashes.
Fix this by stopping job scheduling before calling
aie2_release_resource() and restarting it after the release completes.
Additionally, aie2_sched_job_run() now checks whether the hardware
context is still active.
Fixes: 4fd6ca90fc7f ("accel/amdxdna: Refactor hardware context destroy routine")
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20260130003255.2083255-1-lizhi.hou@amd.com
|
|
Some tests trigger a crash in iommu_sva_unbind_device() due to
accessing iommu_mm after the associated mm structure has been
freed.
Fix this by taking an explicit reference to the mm structure
after successfully binding the device, and releasing it only
after the device is unbound. This ensures the mm remains valid
for the entire SVA bind/unbind lifetime.
Fixes: be462c97b7df ("accel/amdxdna: Add hardware context")
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20260128002356.1858122-1-lizhi.hou@amd.com
|
|
https://gitlab.freedesktop.org/drm/xe/kernel into drm-next
- Reduce LRC timestamp stuck message on VFs to notice (Brost)
- Disable GuC Power DCC strategy on PTL (Vinay)
- Unregister drm device on probe error (Lin)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/aXuyrtsnlAOmj_OB@intel.com
|
|
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
Two fixes for NULL pointer dereference in imx8 following the bridge
refcounting conversions, and one for the bridge connector following the
HDMI audio reworks.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <mripard@redhat.com>
Link: https://patch.msgid.link/20260129-efficient-jerboa-of-ecstasy-822832@houat
|
|
https://gitlab.freedesktop.org/drm/i915/kernel into drm-next
- Prevent u64 underflow in intel_fbc_stolen_end
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patch.msgid.link/aXsWGWjacEJ03rTs@jlahtine-mobl
|
|
Kernel gfx queues do not need to be reinitialized or
remapped after a reset. Align with gfx11.
v2: preserve init and remap for MMIO case.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Kernel gfx queues do not need to be reinitialized or
remapped after a reset. This fixes queue reset failures
on APUs.
v2: preserve init and remap for MMIO case.
Fixes: b3e9bfd86658 ("drm/amdgpu/gfx11: add ring reset callbacks")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4789
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
wptr is a 64 bit value and we need to update the
full value, not just 32 bits. Align with what we
already do for KCQs.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
wptr is a 64 bit value and we need to update the
full value, not just 32 bits. Align with what we
already do for KCQs.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
wptr is a 64 bit value and we need to update the
full value, not just 32 bits. Align with what we
already do for KCQs.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
MES is enabled by default from gfx11+, use AMDGPU_MQD_SIZE_ALIGN
unconditionally for gfx11+.
Signed-off-by: Lang Yu <lang.yu@amd.com>
Reviewed-by: David Belanger <david.belanger@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Make allocate_mqd consistent with other callbacks.
Prepare for next patch to use mqd_manager->mqd_size.
Signed-off-by: Lang Yu <lang.yu@amd.com>
Reviewed-by: David Belanger <david.belanger@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Use AMDGPU_MQD_SIZE_ALIGN for both kernel and user queue.
Signed-off-by: Lang Yu <lang.yu@amd.com>
Reviewed-by: David Belanger <david.belanger@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Instead of returning feature bit mask of allowed features, initialize
the allowed features in the callback implementation itself.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Remove commented and redundant logic in get_allowed_feature_mask
implementation.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Instead of using bitmap operations, add wrapper interface functions to
operate on smu features.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add a bitmap struct to represent smu feature bits and functions to set/clear features.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
MES FW uses address(mqd_addr + sizeof(struct mqd) + 3*sizeof(uint32_t))
as fence address and writes a 32 bit fence value to this address. Driver
needs to allocate some extra memory(at least 4 DWs) in addition to
sizeof(struct mqd) as mqd memory(limited to gfx/compute/sdma queue).
For gfx11/12, sizeof(struct mqd) < PAGE_SIZE, KGD allocates mqd memory with
PAGE_SIZE aligned works. For gfx12.1, sizeof(struct mqd) == PAGE_SIZE,
it doesn't work.
KFD mqd manager hardcodes mqd size to PAGE_SIZE/MQD_SIZE across different
IP versions to solve this issue.
To avoid hardcoding in differnet places and across different IP versions.
Let's use AMDGPU_MQD_SIZE_ALIGN instead. It is used in two places.
1. mqd memory alloction
2. mqd stride handling for multi xcc config
v2: Use AMDGPU_GPU_PAGE_ALIGN. (Mukul)
Signed-off-by: Lang Yu <lang.yu@amd.com>
Reviewed-by: David Belanger <david.belanger@amd.com> (v1)
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add validation to ensure user queue sizes meet hardware requirements:
- Size must be a power of two for efficient ring buffer wrapping
- Size must be at least AMDGPU_GPU_PAGE_SIZE to prevent undersized allocations
This prevents invalid configurations that could lead to GPU faults or
unexpected behavior.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The EXEC_COUNT field must be > 0. In the gfx shadow
handling we always emit a cond_exec packet after the gfx_shadow
packet, but the EXEC_COUNT never gets patched. This leads
to a hang when we try and reset queues on gfx11 APUs.
Fixes: c68cbbfd54c6 ("drm/amdgpu: cleanup conditional execution")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4789
Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The reference clock is supposed to be 100Mhz, but it
appears to actually be slightly lower (99.81Mhz).
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14451
Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|