summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2025-12-22drm/xe/pf: Initialize scheduler groupsDaniele Ceraolo Spurio6-0/+198
Scheduler groups (a.k.a. Engine Groups Scheduling, or EGS) is a GuC feature that allows the driver to define groups of engines that are independently scheduled across VFs, which allows different VFs to be active on the HW at the same time on different groups. The feature is available for BMG and newer HW starting on GuC 70.53.0, but some required fixes have been added to GuC 70.55.1. This is intended for specific scenarios where the admin knows that the VFs are not going to fully utilize the HW and therefore assigning all of it to a single VF would lead to part of it being permanently idle. We do not allow the admin to decide how to divide the engines across groups, but we instead support specific configurations that are designed for specific use-cases. During PF initialization we detect which configurations are possible on a given GT and create the relevant groups. Since the GuC expect a mask for each class for each group, that is what we save when we init the configs. Right now we only have one use-case on the media GT. If the VFs are running a frame render + encoding at a not-too-high resolution (e.g. 1080@30fps) the render can produce frames faster than the video engine can encode them, which means that the maximum number of parallel VFs is limited by the VCS bandwidth. Since our products can have multiple VCS engines, allowing multiple VFs to be active on the different VCS engines at the same time allows us to run more parallel VFs on the same HW. Given that engines in the same media slice share some resources (e.g. SFC), we assign each media slice to a different scheduling group. We refer to this configuration as "media_slices", given that each slice gets its own group. Since upcoming products have a different number of video engines per-slice, for now we limit the media_slices mode to BMG, but we expect to add support for newer HW soon. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/20251218223846.1146344-17-daniele.ceraolospurio@intel.com
2025-12-22drm/gt/guc: extract scheduler-related defines from guc_fwif.hDaniele Ceraolo Spurio4-54/+67
Some upcoming KLVs are sized based on the engine counts, so we need those defines to be moved to a separate file to include them from guc_klv_abi.h (which is already included by guc_fwif.h). Instead of moving just the engine-related defines, it is cleaner to move all scheduler-related defines (i.e., everything engine or context related). Note that the legacy GuC defines have not been moved and have instead been dropped because Xe doesn't support any GuC old enough to still use them. While at it, struct guc_ctxt_registration_info has been moved to guc_submit.c since it doesn't come from the GuC specs (we added it to make things simpler in our code). Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/20251218223846.1146344-16-daniele.ceraolospurio@intel.com
2025-12-22drm/xe/gt: Add engine masks for each classDaniele Ceraolo Spurio5-8/+15
Follow up patches will need the engine masks for VCS and VECS engines. Since we already have a macro for the CCS engines, just extend the same approach to all classes. To avoid confusion with the XE_HW_ENGINE_*_MASK masks, the new macros use the _INSTANCES suffix instead. For consistency, rename CCS_MASK to CCS_INSTANCES as well. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/20251218223846.1146344-15-daniele.ceraolospurio@intel.com
2025-12-19drm/xe: Print GuC queue submission state on engine resetMatthew Brost1-2/+3
Print the GuC queue submission state when an engine reset occurs, as this provides clues about the cause of the reset. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patch.msgid.link/20251218224546.4057424-1-matthew.brost@intel.com
2025-12-19drm/xe: Increase log level for unhandled page faultsMatthew Brost2-24/+24
Set the kernel log level for unhandled page faults to match the log level (info) for engine resets. Currently, dmesg output can be confusing because it shows an engine reset without indicating the page fault that caused it. Without this change, the GuC log must be examined to determine the source of the engine reset. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://patch.msgid.link/20251218223745.4045207-1-matthew.brost@intel.com
2025-12-19drm/xe/xe_survivability: Add index bound checkRiana Tauro1-3/+7
Fix static analysis tool reported issue. Add index bound check before accessing info array to prevent out of bound. Fixes: f4e9fc967afd ("drm/xe/xe_survivability: Redesign survivability mode") Signed-off-by: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patch.msgid.link/20251219105224.871930-6-riana.tauro@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-12-19drm/xe/xe_survivability: Use static for survivability info attributesRiana Tauro1-9/+9
Fix sparse warnings. Use static for survivability info attributes. Fixes: f4e9fc967afd ("drm/xe/xe_survivability: Redesign survivability mode") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202512101919.G12cuhBJ-lkp@intel.com/ Signed-off-by: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patch.msgid.link/20251219105224.871930-5-riana.tauro@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-12-19drm/xe/pmu: Replace sprintf() with sysfs_emit()Madhur Kumar1-1/+1
Replace sprintf() calls with sysfs_emit() to follow current kernel coding standards. sysfs_emit() is the preferred method for formatting sysfs output as it provides better bounds checking and is more secure. Signed-off-by: Madhur Kumar <madhurkumar004@gmail.com> Link: https://patch.msgid.link/20251214083659.2412218-1-madhurkumar004@gmail.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> [Rodrigo adjusted commit message while pushing it]
2025-12-19Merge drm/drm-next into drm-xe-nextThomas Hellström10864-167997/+545722
Backmerging to bring in 6.19-rc1. An important upstream bugfix and to help unblock PTL CI. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2025-12-19drm/xe: Fix documentation heading levels in xe_guc_pc.cSwaraj Gaikwad1-2/+2
Sphinx reports htmldocs warnings: Documentation/gpu/xe/xe_firmware:31: ./drivers/gpu/drm/xe/xe_guc_pc.c:76: ERROR: A level 2 section cannot be used here. Documentation/gpu/xe/xe_firmware:31: ./drivers/gpu/drm/xe/xe_guc_pc.c:87: ERROR: A level 2 section cannot be used here. The xe_guc_pc.c documentation is included inside xe_firmware.rst. The headers in the C file currently use '=' underlines, which conflict with the parent document's section levels. Fix this by demoting "Frequency management" and "Render-C States" headers from '=' to '-' to correctly nest them as subsections. Build environment: Python 3.13.7 Sphinx 8.2.3 docutils 0.22.3 Signed-off-by: Swaraj Gaikwad <swarajgaikwad1925@gmail.com> Link: https://patch.msgid.link/20251209094836.18589-1-swarajgaikwad1925@gmail.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-12-19drm/xe/xe_survivability: Remove unused indexRiana Tauro1-11/+4
Remove unused index variable and fix for loop. Fixes: f4e9fc967afd ("drm/xe/xe_survivability: Redesign survivability mode") Reported-by: Nathan Chancellor <nathan@kernel.org> Closes: https://lore.kernel.org/intel-xe/20251210075757.GA1206705@ax162/ Signed-off-by: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patch.msgid.link/20251218105151.586575-5-riana.tauro@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-12-19drm/xe/nvm: enable cri platformAlexander Usyskin2-11/+24
Mark CRI as one that have the CSC NVM device. Update the writable override flow to take the information from the scratch register for CRI. Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patch.msgid.link/20251216111034.3093507-1-alexander.usyskin@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-12-18drm/xe: Drop preempt-fences when destroying imported dma-bufs.Thomas Hellström1-11/+4
When imported dma-bufs are destroyed, TTM is not fully individualizing the dma-resv, but it *is* copying the fences that need to be waited for before declaring idle. So in the case where the bo->resv != bo->_resv we can still drop the preempt-fences, but make sure we do that on bo->_resv which contains the fence-pointer copy. In the case where the copying fails, bo->_resv will typically not contain any fences pointers at all, so there will be nothing to drop. In that case, TTM would have ensured all fences that would have been copied are signaled, including any remaining preempt fences. Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Fixes: fa0af721bd1f ("drm/ttm: test private resv obj on release/destroy") Cc: Matthew Brost <matthew.brost@intel.com> Cc: <stable@vger.kernel.org> # v6.16+ Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Tested-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20251217093441.5073-1-thomas.hellstrom@linux.intel.com
2025-12-18drm/xe/eustall: Disallow 0 EU stall property valuesAshutosh Dixit1-1/+1
An EU stall property value of 0 is invalid and will cause a NPD. Reported-by: Peter Senna Tschudin <peter.senna@linux.intel.com> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6453 Fixes: 1537ec85ebd7 ("drm/xe/uapi: Introduce API for EU stall sampling") Cc: stable@vger.kernel.org Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Harish Chegondi <harish.chegondi@intel.com> Link: https://patch.msgid.link/20251212061850.1565459-4-ashutosh.dixit@intel.com
2025-12-18drm/xe/oa: Disallow 0 OA property valuesAshutosh Dixit1-1/+1
An OA property value of 0 is invalid and will cause a NPD. Reported-by: Peter Senna Tschudin <peter.senna@linux.intel.com> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6452 Fixes: cc4e6994d5a2 ("drm/xe/oa: Move functions up so they can be reused for config ioctl") Cc: stable@vger.kernel.org Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Harish Chegondi <harish.chegondi@intel.com> Link: https://patch.msgid.link/20251212061850.1565459-3-ashutosh.dixit@intel.com
2025-12-18drm/xe/oa: Move default oa unit assignment earlier during stream openAshutosh Dixit1-4/+4
De-referencing param.oa_unit, when an OA unit id is not provided during stream open, results in NPD below. Oops: general protection fault, probably for non-canonical address... KASAN: null-ptr-deref in range... RIP: 0010:xe_oa_stream_open_ioctl+0x169/0x38a0 xe_observation_ioctl+0x19f/0x270 drm_ioctl_kernel+0x1f4/0x410 Fix this by moving default oa unit assignment before the dereference. Reported-by: Peter Senna Tschudin <peter.senna@linux.intel.com> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6840 Fixes: c7e269aa565f ("drm/xe/oa: Allow exec_queue's to be specified only for OAG OA unit") Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Harish Chegondi <harish.chegondi@intel.com> Link: https://patch.msgid.link/20251212061850.1565459-2-ashutosh.dixit@intel.com
2025-12-18drm/xe/pf: Add handling for MLRC adverse event thresholdDaniele Ceraolo Spurio2-0/+10
Since it is illegal to register a MLRC context when scheduler groups are enabled, the GuC consider the VF doing so as an adverse event. Like for other adverse event, there is a threshold for how many times the event can happen before the GuC throws an error, which we need to add support for. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Acked-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20251216214902.1429-5-michal.wajdeczko@intel.com
2025-12-18drm/xe/pf: Prepare for new threshold KLVsMichal Wajdeczko3-10/+23
We want to extend our macro-based KLV list definitions with new information about the version from which given KLV is supported. Prepare our code generators to emit dedicated version check if a KLV was defined with the version information. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Acked-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20251216214902.1429-4-michal.wajdeczko@intel.com
2025-12-18drm/xe/guc: Introduce GUC_FIRMWARE_VER_AT_LEAST helperMichal Wajdeczko3-3/+24
There are already few places in the code where we need to check GuC firmware version. Wrap existing raw conditions into a named helper macro to make it clear and avoid explicit call of the MAKE_GUC_VER. Suggested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Acked-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20251216214902.1429-3-michal.wajdeczko@intel.com
2025-12-18drm/xe: Introduce IF_ARGS macro utilityMichal Wajdeczko2-0/+81
We want to extend our macro-based KLV list definitions with new information about the version from which given KLV is supported. Add utility IF_ARGS macro that can be used in code generators to emit different code based on the presence of additional arguments. Introduce macro itself and extend our kunit tests to cover it. We will use this macro in next patch. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Acked-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20251217224018.3490-1-michal.wajdeczko@intel.com
2025-12-18drm/xe: Fix NULL pointer dereference in xe_exec_ioctlTapani Pälli1-6/+4
Helper function xe_sync_needs_wait expects sync->fence when accessing flags, patch makes sure we call only when sync->fence exists. v2: move null checking to xe_sync_needs_wait and make xe_sync_entry_wait utilize this helper (Matthew Auld) v3: further simplify code (Matthew Auld) Fixes NULL pointer dereference seen with Vulkan workloads: [ 118.410401] RIP: 0010:xe_sync_needs_wait+0x27/0x50 [xe] Fixes: 4ac9048d0501 ("drm/xe: Wait on in-syncs when swicthing to dma-fence mode") Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20251217132412.435755-1-tapani.palli@intel.com
2025-12-17MAINTAINERS: Update Xe driver maintainersRodrigo Vivi1-0/+1
Add Matt Brost, one of the Xe driver creators, as maintainer. Cc: Matthew Brost <matthew.brost@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: David Airlie <airlied@gmail.com> Cc: Simona Vetter <simona.vetter@ffwll.ch> Cc: dri-devel@lists.freedesktop.org Cc: linux-kernel@vger.kernel.org Acked-by: Simona Vetter <simona.vetter@ffwll.ch> Acked-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patch.msgid.link/20251204193403.930328-2-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-12-17drm/xe/xe_sriov_vfio: Fix return value in xe_sriov_vfio_migration_supported()Dan Carpenter1-1/+1
The xe_sriov_vfio_migration_supported() function is type bool so returning -EPERM means returning true. Return false instead. Fixes: 17f22465c5a5 ("drm/xe/pf: Export helpers for VFIO") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/aTLEZ4g-FD-iMQ2V@stanley.mountain Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
2025-12-17drm/xe/vf: fix return type in vf_migration_init_late()Dan Carpenter1-1/+1
The vf_migration_init_late() function is supposed to return zero on success and negative error codes on failure. The error code eventually gets propagated back to the probe() function and returned. The problem is it's declared as type bool so it returns true on error. Change it to type int instead. Fixes: 2e2dab20dd66 ("drm/xe/vf: Enable VF migration only on supported GuC versions") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/aTK9pwJ_roc8vpDi@stanley.mountain Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
2025-12-17drm/xe/oa: Always set OAG_OAGLBCTXCTRL_COUNTER_RESUMEAshutosh Dixit1-3/+4
Reports can be written out to the OA buffer using ways other than periodic sampling. These include mmio trigger and context switches. To support these use cases, when periodic sampling is not enabled, OAG_OAGLBCTXCTRL_COUNTER_RESUME must be set. Fixes: 1db9a9dc90ae ("drm/xe/oa: OA stream initialization (OAG)") Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://patch.msgid.link/20251205212613.826224-4-ashutosh.dixit@intel.com
2025-12-17drm/xe/rtp: Whitelist OAMERT MMIO trigger registersAshutosh Dixit1-0/+19
Whitelist OAMERT registers to enable userspace to execute MMIO triggers on OAMERT units. Registers are whitelisted for compute and copy class engines. Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://patch.msgid.link/20251205212613.826224-3-ashutosh.dixit@intel.com
2025-12-17drm/xe/oa/uapi: Expose MERT OA unitAshutosh Dixit3-3/+46
A MERT OA unit is available in the SoC on some platforms. Add support for this OA unit and expose it to userspace. The MERT OA unit does not have any HW engines attached, but is otherwise similar to an OAM unit. Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patch.msgid.link/20251205212613.826224-2-ashutosh.dixit@intel.com
2025-12-16drm/xe: Add more GT stats around pagefault mode switch flowsMatthew Brost3-0/+30
Add GT stats to measure the time spent switching between pagefault mode and dma-fence mode. Also add a GT stat to indicate when pagefault suspend is skipped because the system is idle. These metrics will help profile pagefault workloads while 3D and display are enabled. v2: - Use GT stats helper functions (Francois) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Francois Dugast <francois.dugast@intel.com> Link: https://patch.msgid.link/20251212182847.1683222-8-matthew.brost@intel.com
2025-12-16drm/xe: Add GT stats ktime helpersMatthew Brost2-20/+41
Normalize GT stats that record execution periods in code paths by adding helpers to perform the ktime calculation. Use these helpers in the SVM code. Suggested-by: Francois Dugast <francois.dugast@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patch.msgid.link/20251212182847.1683222-7-matthew.brost@intel.com
2025-12-16drm/xe: Wait on in-syncs when swicthing to dma-fence modeMatthew Brost5-11/+87
If a dma-fence submission has in-fences and pagefault queues are running work, there is little incentive to kick the pagefault queues off the hardware until the dma-fence submission is ready to run. Therefore, wait on the in-fences of the dma-fence submission before removing the pagefault queues from the hardware. v2: - Fix kernel doc (CI) - Don't wait under lock (Thomas) - Make wait interruptable Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patch.msgid.link/20251212182847.1683222-6-matthew.brost@intel.com
2025-12-16drm/xe: Skip exec queue schedule toggle if queue is idle during suspendMatthew Brost3-4/+70
If an exec queue is idle, there is no need to issue a schedule disable to the GuC when suspending the queue’s execution. Opportunistically skip this step if the queue is idle and not a parallel queue. Parallel queues must have their scheduling state flipped in the GuC due to limitations in how submission is implemented in run_job(). Also if all pagefault queues can skip the schedule disable during a switch to dma-fence mode, do not schedule a resume for the pagefault queues after the next submission. v2: - Don't touch the LRC tail is queue is suspended but enabled in run_job (CI) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patch.msgid.link/20251212182847.1683222-5-matthew.brost@intel.com
2025-12-16drm/xe: Add debugfs knobs to control long running workload timeslicingMatthew Brost4-2/+83
Add debugfs knobs to control timeslicing for long-running workloads, allowing quick tuning of values when running benchmarks. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patch.msgid.link/20251212182847.1683222-4-matthew.brost@intel.com
2025-12-16drm/xe: Use usleep_range for accurate long-running workload timeslicingMatthew Brost1-1/+19
msleep is not very accurate in terms of how long it actually sleeps, whereas usleep_range is precise. Replace the timeslice sleep for long-running workloads with the more accurate usleep_range to avoid jitter if the sleep period is less than 20ms. Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: stable@vger.kernel.org Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patch.msgid.link/20251212182847.1683222-3-matthew.brost@intel.com
2025-12-16drm/xe: Adjust long-running workload timeslices to reasonable valuesMatthew Brost2-2/+5
A 10ms timeslice for long-running workloads is far too long and causes significant jitter in benchmarks when the system is shared. Adjust the value to 5ms for preempt-fencing VMs, as the resume step there is quite costly as memory is moved around, and set it to zero for pagefault VMs, since switching back to pagefault mode after dma-fence mode is relatively fast. Also change min_run_period_ms to 'unsiged int' type rather than 's64' as only positive values make sense. Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: stable@vger.kernel.org Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patch.msgid.link/20251212182847.1683222-2-matthew.brost@intel.com
2025-12-16drm/xe/oa: Limit num_syncs to prevent oversized allocationsShuicheng Lin1-0/+3
The OA open parameters did not validate num_syncs, allowing userspace to pass arbitrarily large values, potentially leading to excessive allocations. Add check to ensure that num_syncs does not exceed DRM_XE_MAX_SYNCS, returning -EINVAL when the limit is violated. v2: use XE_IOCTL_DBG() and drop duplicated check. (Ashutosh) Fixes: c8507a25cebd ("drm/xe/oa/uapi: Define and parse OA sync properties") Cc: Matthew Brost <matthew.brost@intel.com> Cc: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20251205234715.2476561-6-shuicheng.lin@intel.com
2025-12-16drm/xe: Limit num_syncs to prevent oversized allocationsShuicheng Lin3-1/+6
The exec and vm_bind ioctl allow userspace to specify an arbitrary num_syncs value. Without bounds checking, a very large num_syncs can force an excessively large allocation, leading to kernel warnings from the page allocator as below. Introduce DRM_XE_MAX_SYNCS (set to 1024) and reject any request exceeding this limit. " ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1217 at mm/page_alloc.c:5124 __alloc_frozen_pages_noprof+0x2f8/0x2180 mm/page_alloc.c:5124 ... Call Trace: <TASK> alloc_pages_mpol+0xe4/0x330 mm/mempolicy.c:2416 ___kmalloc_large_node+0xd8/0x110 mm/slub.c:4317 __kmalloc_large_node_noprof+0x18/0xe0 mm/slub.c:4348 __do_kmalloc_node mm/slub.c:4364 [inline] __kmalloc_noprof+0x3d4/0x4b0 mm/slub.c:4388 kmalloc_noprof include/linux/slab.h:909 [inline] kmalloc_array_noprof include/linux/slab.h:948 [inline] xe_exec_ioctl+0xa47/0x1e70 drivers/gpu/drm/xe/xe_exec.c:158 drm_ioctl_kernel+0x1f1/0x3e0 drivers/gpu/drm/drm_ioctl.c:797 drm_ioctl+0x5e7/0xc50 drivers/gpu/drm/drm_ioctl.c:894 xe_drm_ioctl+0x10b/0x170 drivers/gpu/drm/xe/xe_device.c:224 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:598 [inline] __se_sys_ioctl fs/ioctl.c:584 [inline] __x64_sys_ioctl+0x18b/0x210 fs/ioctl.c:584 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xbb/0x380 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f ... " v2: Add "Reported-by" and Cc stable kernels. v3: Change XE_MAX_SYNCS from 64 to 1024. (Matt & Ashutosh) v4: s/XE_MAX_SYNCS/DRM_XE_MAX_SYNCS/ (Matt) v5: Do the check at the top of the exec func. (Matt) Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Reported-by: Koen Koning <koen.koning@intel.com> Reported-by: Peter Senna Tschudin <peter.senna@linux.intel.com> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6450 Cc: <stable@vger.kernel.org> # v6.12+ Cc: Matthew Brost <matthew.brost@intel.com> Cc: Michal Mrozek <michal.mrozek@intel.com> Cc: Carl Zhang <carl.zhang@intel.com> Cc: José Roberto de Souza <jose.souza@intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Ivan Briano <ivan.briano@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20251205234715.2476561-5-shuicheng.lin@intel.com
2025-12-16drm/xe/guc: Fix version check for page-reclaim featureMichal Wajdeczko1-1/+1
Page reclamation interfaces were introduced in GuC firmware version 70.31.0 (which corresponds to GuC ABI version 1.14.0), but since this feature is also available for the VFs and VFs don't know the firmware version, use GuC compatibility version check instead. Fixes: 77ebc7c10d16 ("drm/xe/guc: Add page reclamation interface to GuC") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Brian Nguyen <brian3.nguyen@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Brian Nguyen <brian3.nguyen@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20251215170433.196398-1-michal.wajdeczko@intel.com
2025-12-14Linux 6.19-rc1v6.19-rc1Linus Torvalds1-2/+2
2025-12-14Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds18-56/+165
Pull SCSI fixes from James Bottomley: "The only core fix is in doc; all the others are in drivers, with the biggest impacts in libsas being the rollback on error handling and in ufs coming from a couple of error handling fixes, one causing a crash if it's activated before scanning and the other fixing W-LUN resumption" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: ufs: qcom: Fix confusing cleanup.h syntax scsi: libsas: Add rollback handling when an error occurs scsi: device_handler: Return error pointer in scsi_dh_attached_handler_name() scsi: ufs: core: Fix a deadlock in the frequency scaling code scsi: ufs: core: Fix an error handler crash scsi: Revert "scsi: libsas: Fix exp-attached device scan after probe failure scanned in again after probe failed" scsi: ufs: core: Fix RPMB link error by reversing Kconfig dependencies scsi: qla4xxx: Use time conversion macros scsi: qla2xxx: Enable/disable IRQD_NO_BALANCING during reset scsi: ipr: Enable/disable IRQD_NO_BALANCING during reset scsi: imm: Fix use-after-free bug caused by unfinished delayed work scsi: target: sbp: Remove KMSG_COMPONENT macro scsi: core: Correct documentation for scsi_device_quiesce() scsi: mpi3mr: Prevent duplicate SAS/SATA device entries in channel 1 scsi: target: Reset t_task_cdb pointer in error case scsi: ufs: core: Fix EH failure after W-LUN resume error
2025-12-14Merge tag 'ceph-for-6.19-rc1' of https://github.com/ceph/ceph-clientLinus Torvalds9-79/+316
Pull ceph updates from Ilya Dryomov: "We have a patch that adds an initial set of tracepoints to the MDS client from Max, a fix that hardens osdmap parsing code from myself (marked for stable) and a few assorted fixups" * tag 'ceph-for-6.19-rc1' of https://github.com/ceph/ceph-client: rbd: stop selecting CRC32, CRYPTO, and CRYPTO_AES ceph: stop selecting CRC32, CRYPTO, and CRYPTO_AES libceph: make decode_pool() more resilient against corrupted osdmaps libceph: Amend checking to fix `make W=1` build breakage ceph: Amend checking to fix `make W=1` build breakage ceph: add trace points to the MDS client libceph: fix log output race condition in OSD client
2025-12-14Merge tag 'tomoyo-pr-20251212' of git://git.code.sf.net/p/tomoyo/tomoyoLinus Torvalds1-7/+2
Pull tomoyo update from Tetsuo Handa: "Trivial optimization" * tag 'tomoyo-pr-20251212' of git://git.code.sf.net/p/tomoyo/tomoyo: tomoyo: Use local kmap in tomoyo_dump_page()
2025-12-13Merge tag 'smp-urgent-2025-12-12' of ↵Linus Torvalds1-9/+16
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull CPU hotplug fix from Ingo Molnar: - Fix CPU hotplug callbacks to disable interrupts on UP kernels * tag 'smp-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: cpu: Make atomic hotplug callbacks run with interrupts disabled on UP
2025-12-13Merge tag 'perf-urgent-2025-12-12' of ↵Linus Torvalds4-18/+20
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf event fixes from Ingo Molnar: - Fix NULL pointer dereference crash in the Intel PMU driver - Fix missing read event generation on task exit - Fix AMD uncore driver init error handling - Fix whitespace noise * tag 'perf-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Fix NULL event dereference crash in handle_pmi_common() perf/core: Fix missing read event generation on task exit perf/x86/amd/uncore: Fix the return value of amd_uncore_df_event_init() on error perf/uprobes: Remove <space><Tab> whitespace noise
2025-12-13Merge tag 'irq-urgent-2025-12-12' of ↵Linus Torvalds4-21/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Ingo Molnar: - Fix error code in the irqchip/mchp-eic driver - Fix setup_percpu_irq() affinity assumptions - Remove the unused irq_domain_add_tree() function * tag 'irq-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/mchp-eic: Fix error code in mchp_eic_domain_alloc() irqdomain: Delete irq_domain_add_tree() genirq: Allow NULL affinity for setup_percpu_irq()
2025-12-13Merge tag 'core-urgent-2025-12-12' of ↵Linus Torvalds2-2/+6
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc core fixes from Ingo Molnar: - Improve bug reporting - Suppress W=1 format warning - Improve rseq scalability on Clang builds * tag 'core-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rseq: Always inline rseq_debug_syscall_return() bug: Hush suggest-attribute=format for __warn_printf() bug: Let report_bug_entry() provide the correct bugaddr
2025-12-13Merge tag 'mm-nonmm-stable-2025-12-11-11-47' of ↵Linus Torvalds22-44/+154
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc updates from Andrew Morton: "There are no significant series in this small merge. Please see the individual changelogs for details" [ Editor's note: it's mainly ocfs2 and a couple of random fixes ] * tag 'mm-nonmm-stable-2025-12-11-11-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm: memfd_luo: add CONFIG_SHMEM dependency mm: shmem: avoid build warning for CONFIG_SHMEM=n ocfs2: fix memory leak in ocfs2_merge_rec_left() ocfs2: invalidate inode if i_mode is zero after block read ocfs2: avoid -Wflex-array-member-not-at-end warning ocfs2: convert remaining read-only checks to ocfs2_emergency_state ocfs2: add ocfs2_emergency_state helper and apply to setattr checkpatch: add uninitialized pointer with __free attribute check args: fix documentation to reflect the correct numbers ocfs2: fix kernel BUG in ocfs2_find_victim_chain liveupdate: luo_core: fix redundant bound check in luo_ioctl() ocfs2: validate inline xattr size and entry count in ocfs2_xattr_ibody_list fs/fat: remove unnecessary wrapper fat_max_cache() ocfs2: replace deprecated strcpy with strscpy ocfs2: check tl_used after reading it from trancate log inode liveupdate: luo_file: don't use invalid list iterator
2025-12-13Merge tag 'mm-stable-2025-12-11-11-39' of ↵Linus Torvalds9-97/+131
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull more MM updates from Andrew Morton: - "powerpc/pseries/cmm: two smaller fixes" (David Hildenbrand) fixes a couple of minor things in ppc land - "Improve folio split related functions" (Zi Yan) some cleanups and minorish fixes in the folio splitting code * tag 'mm-stable-2025-12-11-11-39' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm/damon/tests/core-kunit: avoid damos_test_commit stack warning mm: vmscan: correct nr_requested tracing in scan_folios MAINTAINERS: add idr core-api doc file to XARRAY mm/hugetlb: fix incorrect error return from hugetlb_reserve_pages() mm: fix CONFIG_STACK_GROWSUP typo in mm.h mm/huge_memory: fix folio split stats counting mm/huge_memory: make min_order_for_split() always return an order mm/huge_memory: replace can_split_folio() with direct refcount calculation mm/huge_memory: change folio_split_supported() to folio_check_splittable() mm/sparse: fix sparse_vmemmap_init_nid_early definition without CONFIG_SPARSEMEM powerpc/pseries/cmm: adjust BALLOON_MIGRATE when migrating pages powerpc/pseries/cmm: call balloon_devinfo_init() also without CONFIG_BALLOON_COMPACTION
2025-12-13file: ensure cleanupChristian Brauner1-7/+6
Brown paper bag time. This is a silly oversight where I missed to drop the error condition checking to ensure we clean up on early error returns. I have an internal unit testset coming up for this which will catch all such issues going forward. Reported-by: Chris Mason <clm@fb.com> Reported-by: Jeff Layton <jlayton@kernel.org> Fixes: 011703a9acd7 ("file: add FD_{ADD,PREPARE}()") Signed-off-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-12-13x86/hv: Add gitignore entry for generated header fileLinus Torvalds1-0/+1
Commit 7bfe3b8ea6e3 ("Drivers: hv: Introduce mshv_vtl driver") added a new generated header file for the offsets into the mshv_vtl_cpu_context structure to be used by the low-level assembly code. But it didn't add the .gitignore file to go with it, so 'git status' and friends will mention it. Let's add the gitignore file before somebody thinks that generated header should be committed. Fixes: 7bfe3b8ea6e3 ("Drivers: hv: Introduce mshv_vtl driver") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-12-13Merge tag 'drm-fixes-2025-12-13' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds16-76/+166
Pull more drm fixes from Dave Airlie: "These are the enqueued fixes that ended up in our fixes branch, nouveau mostly, along with some small fixes in other places. plane: - Handle IS_ERR vs NULL in drm_plane_create_hotspot_properties() ttm: - fix devcoredump for evicted bos panel: - Fix stack usage warning in novatek-nt35560 nouveau: - alloc fwsec sb at boot to avoid s/r problems - fix strcpy usage - fix i2c encoder crash bridge: - Ignore spurious PLL_UNLOCK bit in ti-sn65dsi83 mgag200: - Fix bigendian handling in mgag200 tilcdc: - Fix probe failure in tilcdc" * tag 'drm-fixes-2025-12-13' of https://gitlab.freedesktop.org/drm/kernel: drm/mgag200: Fix big-endian support drm/tilcdc: Fix removal actions in case of failed probe drm/ttm: Avoid NULL pointer deref for evicted BOs drm: nouveau: Replace sprintf() with sysfs_emit() drm/nouveau: fix circular dep oops from vendored i2c encoder drm/nouveau: refactor deprecated strcpy drm/plane: Fix IS_ERR() vs NULL check in drm_plane_create_hotspot_properties() drm/bridge: ti-sn65dsi83: ignore PLL_UNLOCK errors drm/nouveau/gsp: Allocate fwsec-sb at boot drm/panel: novatek-nt35560: avoid on-stack device structure