Age | Commit message (Collapse) | Author | Files | Lines |
|
Messaging to GuC may get canceled when device is wedged. Don't
flag this as an error in xe_guc_pc code.
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://lore.kernel.org/r/20250612-wa-14022085890-v4-1-94ba5dcc1e30@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
When injecting fault to xe_guc_ct_send_recv() & xe_guc_mmio_send_recv()
functions, the CI test systems are going out of space and crashing. To
avoid this issue, a new helper function is created and when fault is
injected into this xe_is_injection_active() helper function, ct dead
capture is avoided which suppresses ct dumps in the log.
Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Suggested-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://lore.kernel.org/r/20250612080402.22011-1-satyanarayana.k.v.p@intel.com
|
|
When the GuC fails to load we declare the device wedged. However, the
very first GuC load attempt on GT0 (from xe_gt_init_hwconfig) is done
before the GT1 GuC objects are initialized, so things go bad when the
wedge code attempts to cleanup GT1. To fix this, check the initialization
status in the functions called during wedge.
Fixes: 7dbe8af13c18 ("drm/xe: Wedge the entire device")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Jonathan Cavitt <jonathan.cavitt@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Zhanjun Dong <zhanjun.dong@intel.com>
Cc: stable@vger.kernel.org # v6.12+: 1e1981b16bb1: drm/xe: Fix taking invalid lock on wedge
Cc: stable@vger.kernel.org # v6.12+
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250611214453.1159846-2-daniele.ceraolospurio@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
No idea why, but without this GuC context switches randomly fail when
running IGTs in a loop. Need to follow up why this fixes the
aforementioned issue but can live with a stable driver for now.
Fixes: 617d824c5323 ("drm/xe: Add WA BB to capture active context utilization")
Cc: stable@vger.kernel.org
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Tested-by: Shuicheng Lin <shuicheng.lin@intel.com>
Link: https://lore.kernel.org/r/20250612031925.4009701-1-matthew.brost@intel.com
|
|
Updating range->tile_invalidated should be done with WRITE_ONCE to pair
with READ_ONCE in opportunistic checks.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Maarten Lankhrost <maarten.lankhorst@linux.intel.com>
Link: https://lore.kernel.org/r/20250604234712.2441130-1-matthew.brost@intel.com
|
|
Only the VM dma-resv lock is needed in SVM pagefaults so
xe_vm_lock/unlock can be used instead of drm exec. Micro optimization
but should save some CPU cycles in a critical path.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://lore.kernel.org/r/20250603174012.2195759-1-matthew.brost@intel.com
|
|
In case the BO is in iomem, we can't simply take the vaddr and write to
it. Instead, prepare a separate buffer that is later copied into io
memory. Right now it's just a few words that could be using
xe_map_write32(), but the intention is to grow the WA BB for other
uses.
Fixes: 617d824c5323 ("drm/xe: Add WA BB to capture active context utilization")
Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://lore.kernel.org/r/20250604-wa-bb-fix-v1-1-0dfc5dafcef0@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit ef48715b2d3df17c060e23b9aa636af3d95652f8)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
The post context restore (WA BB) is a mechanism in HW that may be used
for things other than the utilization setup. Create a new function
called setup_wa_bb() that wraps any function writing useful commands in
the buffer.
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Link: https://lore.kernel.org/r/20250604-wa-bb-fix-v1-2-0dfc5dafcef0@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
In case the BO is in iomem, we can't simply take the vaddr and write to
it. Instead, prepare a separate buffer that is later copied into io
memory. Right now it's just a few words that could be using
xe_map_write32(), but the intention is to grow the WA BB for other
uses.
Fixes: 82b98cadb01f ("drm/xe: Add WA BB to capture active context utilization")
Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://lore.kernel.org/r/20250604-wa-bb-fix-v1-1-0dfc5dafcef0@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Add another GMD_ID for Xe2_HPG
Signed-off-by: Shekhar Chauhan <shekhar.chauhan@intel.com>
Signed-off-by: James Ausmus <james.ausmus@intel.com>
Signed-off-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com>
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20250605190804.1287289-4-dnyaneshwar.bhadane@intel.com
|
|
Add set of workarounds for xe2_hpg.
-v2: Fix xe2_hpg GMD version for some workarounds.
-v3: Removed extra Workaround (Matt Roper)
Signed-off-by: Shekhar Chauhan <shekhar.chauhan@intel.com>
Signed-off-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20250605190804.1287289-3-dnyaneshwar.bhadane@intel.com
|
|
A number of files have unnecessary i915_reg.h includes. Drop them.
Reviewed-by: Michał Grzelak <michal.grzelak@intel.com>
Link: https://lore.kernel.org/r/7c4002322f4d8132fd2eaa1a4d688539cdd043c3.1749469962.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
This reverts commit 5a9f299f956ef9764f56044cfca7aafa23cea1d1.
The following crash/regression was seen with the reverted commit
on a specific BMG SKU with no display capabilities:
[ 115.582833] BUG: kernel NULL pointer dereference, address: 00000000000005d0
[ 115.589775] #PF: supervisor write access in kernel mode
[ 115.594976] #PF: error_code(0x0002) - not-present page
[ 115.600088] PGD 0 P4D 0
[ 115.602617] Oops: Oops: 0002 [#1] SMP
[ 115.606267] CPU: 14 UID: 0 PID: 1547 Comm: kworker/14:3 Tainted: G U E 6.15.0-local+ #62 PREEMPT(voluntary)
[ 115.617332] Tainted: [U]=USER, [E]=UNSIGNED_MODULE
[ 115.622100] Hardware name: Intel Corporation Meteor Lake Client Platform/MTL-P DDR5 SODIMM SBS RVP, BIOS MTLPEMI1.R00.3471.D49.2401260852 01/26/2024
[ 115.635314] Workqueue: pm pm_runtime_work
[ 115.639309] RIP: 0010:_raw_spin_lock+0x17/0x30
[ 115.662382] RSP: 0018:ffffd13f82e7bc30 EFLAGS: 00010246
[ 115.667581] RAX: 0000000000000000 RBX: ffff8be919076000 RCX: 0000000000000002
[ 115.674675] RDX: 0000000000000001 RSI: 000000000000004b RDI: 00000000000005d0
[ 115.681775] RBP: ffffd13f82e7bc60 R08: ffffd13f82e7bb00 R09: ffff8beb0c1b06c0
[ 115.688869] R10: ffff8be7c034f4c0 R11: fefefefefefefeff R12: fffffffffffffff0
[ 115.695965] R13: ffff8be9190762e8 R14: ffff8be919077798 R15: 00000000000005d0
[ 115.703062] FS: 0000000000000000(0000) GS:ffff8beb552b6000(0000) knlGS:0000000000000000
[ 115.711106] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 115.716826] CR2: 00000000000005d0 CR3: 000000024c68d002 CR4: 0000000000f72ef0
[ 115.723921] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 115.731015] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
[ 115.738113] PKRU: 55555554
[ 115.740816] Call Trace:
[ 115.743258] <TASK>
[ 115.745363] ? xe_display_flush_cleanup_work+0x92/0x120 [xe]
[ 115.751102] xe_display_pm_runtime_suspend+0x42/0x80 [xe]
[ 115.756542] xe_pm_runtime_suspend+0x11b/0x1b0 [xe]
[ 115.761463] xe_pci_runtime_suspend+0x23/0xd0 [xe]
[ 115.766291] pci_pm_runtime_suspend+0x6b/0x1a0
[ 115.770717] ? pci_pm_thaw_noirq+0xa0/0xa0
[ 115.774797] __rpm_callback+0x48/0x1e0
[ 115.778531] ? pci_pm_thaw_noirq+0xa0/0xa0
[ 115.782614] rpm_callback+0x66/0x70
[ 115.786090] ? pci_pm_thaw_noirq+0xa0/0xa0
[ 115.790173] rpm_suspend+0xe1/0x5e0
[ 115.793647] ? psi_task_switch+0xb8/0x200
[ 115.797643] ? finish_task_switch.isra.0+0x8d/0x270
[ 115.802502] pm_runtime_work+0xa6/0xc0
[ 115.806238] process_one_work+0x186/0x350
[ 115.810234] worker_thread+0x33a/0x480
[ 115.813968] ? process_one_work+0x350/0x350
[ 115.818132] kthread+0x10c/0x220
[ 115.821350] ? kthreads_online_cpu+0x120/0x120
[ 115.825774] ret_from_fork+0x3a/0x60
[ 115.829339] ? kthreads_online_cpu+0x120/0x120
[ 115.833768] ret_from_fork_asm+0x11/0x20
[ 115.829339] ? kthreads_online_cpu+0x120/0x120
[ 115.833768] ret_from_fork_asm+0x11/0x20
[ 115.837680] </TASK>
[ 115.839907] acpi_tad(E) drm(E)
[ 115.931629] CR2: 00000000000005d0
[ 115.934935] ---[ end trace 0000000000000000 ]---
[ 115.939531] RIP: 0010:_raw_spin_lock+0x17/0x30
We cannot yet use xe->display to determine whether display hardware
has been successfully probed/initialized or not. This is because
xe->display would not be set to NULL even with GPUs with no display
capabilities (e.g, GMD_ID_DISPLAY = 0). However, this might change
in the future as Xe and i915 code is unified to deal with no display
cases.
Therefore, for now we have to continue to rely on xe->info.probe_display
(which would be set to false with display-less GPUs) to decide
whether to invoke any display related functions or not.
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Link: https://lore.kernel.org/r/20250605054247.386633-1-vivek.kasireddy@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Backmerging to forward to v6.16-rc1
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
|
|
Print the error from get pages failing, not the cast to -ENODATA.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>>
Link: https://lore.kernel.org/r/20250610045649.3149801-1-matthew.brost@intel.com
|
|
Backmerging to bring in 6.16
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
On old Intel platforms, the size of the GSM (i.e., the stolen memory
that holds the GGTT page table entries) could vary, so the driver needed
to read the actual size from the PCI config space. However from Xe_HP
onward, the GSM is now always guaranteed to be exactly 8MB (which
translates to a 4GB GGTT address space); this is always true regardless
of what the platform's much larger PPGTT address space is.
The bspec doesn't document the PCI config space as being a valid way to
query the size of the GSM after Xe_LP platforms, although so far it
still seems to be giving us proper values for Xe_HP, Xe2, and Xe3.
However we suspect that the config space will stop providing correct
values on some upcoming platforms, so we should stop relying on it.
Instead just use the hardcoded 8MB value as documented elsewhere in the
bspec.
Bspec: 49636, 67090, 50589
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://lore.kernel.org/r/20250605225352.2333981-2-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
|
|
When changing the condition from >= SZ_64K, it was changed to <= SZ_64K.
This disallows migration of 64K, which is the exact minimum allowed.
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5057
Fixes: 794f5493f518 ("drm/xe: Strict migration policy for atomic SVM faults")
Cc: stable@vger.kernel.org
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
Link: https://lore.kernel.org/r/20250521090102.2965100-1-dev@lankhorst.se
(cherry picked from commit 531bef26d189b28bf0d694878c0e064b30990b6c)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
We are already prepared to define firmwares per-GT type, so we
should also prepare our messages to be GT-oriented.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250606204311.813-1-michal.wajdeczko@intel.com
|
|
This is a scripted split of the display related register macros from
i915_reg.h to display/intel_display_regs.h. As a starting point, move
all the macros that are only used in display code (or GVT). If there are
users in core i915 code or soc/, or no users anywhere, keep the macros
in i915_reg.h. This is done in groups of macros separated by blank
lines, moving the comments along with the groups.
Some manually picked macro groups are kept/moved regardless of the
heuristics above.
This is obviously a very crude approach. It's not perfect. But there are
4.2k lines in i915_reg.h, and its refactoring has ground to a halt. This
is the big hammer that splits the file to two, and enables further
cleanup.
Cc: Suraj Kandpal <suraj.kandpal@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com> # v2
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250606102256.2080073-1-jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Sync to v6.16-rc1, among other things to get the fixed size GENMASK_U*()
and BIT_U*() macros.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Add a function to init ggtt for kunit, and use the GGTT function for
initialising the GGTT node without populating it. This
prevents the test from ever knowing about struct xe_ggtt.
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250505121924.921544-11-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
|
|
Split the GGTT PTE readout to a separate function, this is useful for
adding testcases in the next commit, and also cleaner than manually
reading out GGTT.
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Link: https://lore.kernel.org/r/20250505121924.921544-10-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
|
|
The users inside display have been converted to use thepte_encode_flags
callback, we can now remove the pte_encode_bo cb.
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Link: https://lore.kernel.org/r/20250505121924.921544-9-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
|
|
Another small step in removing pte_encode_bo callback.
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Link: https://lore.kernel.org/r/20250505121924.921544-8-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
|
|
For DPT, it is sufficient to get the GGTT encode flags to fill the DPT.
Create a function to return the encode flags, and then encode using the
BO address.
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Link: https://lore.kernel.org/r/20250505121924.921544-7-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
|
|
Pinning large linear display framebuffers is becoming a bottleneck.
My plan of attack is doing a custom walk over the BO, this allows for
easier optimization of consecutive entries.
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250505121924.921544-6-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
|
|
Obtain the id from the root tile. Likely this can be hardcoded to 0,
but use the clean solution of obtaining root id and doing that.
to_xe_device(ggtt->tile) can also be easily replaced with xe.
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250505121924.921544-5-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
|
|
Instead of allocating inside xe_tile, create a new function that returns
an allocated struct xe_ggtt from xe_ggtt.c
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250505121924.921544-4-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
|
|
Another requirement of hiding more of struct xe_ggtt.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250505121924.921544-3-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
|
|
This is the first step to hide the details of struct xe_ggtt.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250505121924.921544-2-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
|
|
IOSF_MBI was only useful for some gen8 platforms, which were never
supported by Xe. Presumably needed for display at one point, but display
is fixed to put stubs in compat-i915-headers/vlv_sideband.h. (in
drm-intel-next: vlv_iosf_sb.h)
Link: https://lore.kernel.org/r/20250605074644.71036-1-dev@lankhorst.se
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
|
|
We shouldn't ever pass more DSS registers than our hardcoded limit,
it should be sufficient to just assert that instead of trying to
fix it, as this will never happen in the production driver.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20250604202908.769-4-michal.wajdeczko@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
|
|
Since we keep registers in the array we can simply count them and
stop relying on magic number when checking if didn't make mistake.
Also we can switch to use xe_gt_assert() since it could be just
our programming mistake during platform bringup, no need to keep
drm_WARN() in the production driver.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20250604202908.769-3-michal.wajdeczko@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
|
|
Instead of passing registers using va_list we can keep them in
the static array and pass as such and also lower driver footprint:
add/remove: 2/0 grow/shrink: 0/2 up/down: 24/-175 (-151)
Function old new delta
geometry_regs - 12 +12
compute_regs - 12 +12
xe_gt_topology_init 550 527 -23
load_dss_mask 449 297 -152
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20250604202908.769-2-michal.wajdeczko@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
|
|
This device pointer is nearly always available without storing
an extra copy for each tt in the system.
Just noticed this while reading over the xe shrinker code.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250605062103.1234620-1-airlied@gmail.com
|
|
The GuC compatibility version that we read from the CSS header in
native/PF and the GuC VF version that we get when a VF handshakes with
the GuC are the same version number, so we should store it into the same
structure. This makes all the checks based on the compatibility version
automatically work for VFs without having to copy the value over.
For completion, also copy the wanted version and set the path to a known
string to indicate that the FW is PF-loaded. This way all the info will
be coherent when dumped from debugfs.
v2: several code cleanups and style changes (Michal), rebase on
bootstrap changes.
v3: s/min/wanted/, clarify that handshake must happen before we can get
the VF versions (Michal)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lukasz Laguna <lukasz.laguna@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://lore.kernel.org/r/20250603235432.720833-10-daniele.ceraolospurio@intel.com
|
|
Instead of using a VF-specific type, we can use the common uc_fw_version
structure. This also means that we can use the available macros to
compare ABI versions.
While at it, exit early from the bootstrap if this is not the first time
we're doing it and the version hasn't changed, so we don't end up
logging it multiple times.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lukasz Laguna <lukasz.laguna@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://lore.kernel.org/r/20250603235432.720833-9-daniele.ceraolospurio@intel.com
|
|
Currently we perform the bootstrap for the primary GT early on during
device init, while the media GT bootstrap happens when we try and fetch
the hwconfig table. For consistency, move the bootstrap of the media GT
happen at the same time as the primary GT, so that all the subsequent
code can rely on both GTs being in the same state.
v2: Also drop config query from min_guc_load since we now do it
early (Michal)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://lore.kernel.org/r/20250603235432.720833-8-daniele.ceraolospurio@intel.com
|
|
The VF ABI version has a branch field, so to store it inside the
uc_fw_version we need to add a new branch variable to the latter.
Existing code needs to be updated to handle the fact that we have the
new field.
v2: split out to its own patch (Michal)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://lore.kernel.org/r/20250603235432.720833-7-daniele.ceraolospurio@intel.com
|
|
Pull drm fixes from Dave Airlie:
"This is pretty much two weeks worth of fixes, plus one thing that
might be considered next: amdkfd is now able to be enabled on risc-v
platforms.
Otherwise, amdgpu and xe with the majority of fixes, and then a
smattering all over.
panel:
- nt37801: fix IS_ERR
- nt37801: fix KConfig
connector:
- Fix null deref in HDMI audio helper.
bridge:
- analogix_dp: fixup clk-disable removal
nouveau:
- minor typo fix (',' vs ';')
msm:
- mailmap updates
i915:
- Fix the enabling/disabling of DP audio SDP splitting
- Fix PSR register definitions for ALPM
- Fix u32 overflow in SNPS PHY HDMI PLL setup
- Fix GuC pending message underflow when submit fails
- Fix GuC wakeref underflow race during reset
xe:
- Two documentation fixes
- A couple of vm init fixes
- Hwmon fixes
- Drop reduntant conversion to bool
- Fix CONFIG_INTEL_VSEC dependency
- Rework eviction rejection of bound external bos
- Stop re-submitting signalled jobs
- A couple of pxp fixes
- Add back a fix that got lost in a merge
- Create LRC bo without VM
- Fix for the above fix
amdgpu:
- UserQ fixes
- SMU 13.x fixes
- VCN fixes
- JPEG fixes
- Misc cleanups
- runtime pm fix
- DCN 4.0.1 fixes
- Misc display fixes
- ISP fix
- VRAM manager fix
- RAS fixes
- IP discovery fix
- Cleaner shader fix for GC 10.1.x
- OD fix
- Non-OLED panel fix
- Misc display fixes
- Brightness fixes
amdkfd:
- Enable CONFIG_HSA_AMD on RISCV
- SVM fix
- Misc cleanups
- Ref leak fix
- WPTR BO fix
radeon:
- Misc cleanups"
* tag 'drm-next-2025-06-06' of https://gitlab.freedesktop.org/drm/kernel: (105 commits)
drm/nouveau/vfn/r535: Convert comma to semicolon
drm/xe: remove unmatched xe_vm_unlock() from __xe_exec_queue_init()
drm/xe: Create LRC BO without VM
drm/xe/guc_submit: add back fix
drm/xe/pxp: Clarify PXP queue creation behavior if PXP is not ready
drm/xe/pxp: Use the correct define in the set_property_funcs array
drm/xe/sched: stop re-submitting signalled jobs
drm/xe: Rework eviction rejection of bound external bos
drm/xe/vsec: fix CONFIG_INTEL_VSEC dependency
drm/xe: drop redundant conversion to bool
drm/xe/hwmon: Move card reactive critical power under channel card
drm/xe/hwmon: Add support to manage power limits though mailbox
drm/xe/vm: move xe_svm_init() earlier
drm/xe/vm: move rebind_work init earlier
MAINTAINERS: .mailmap: update Rob Clark's email address
mailmap: Update entry for Akhil P Oommen
MAINTAINERS: update my email address
MAINTAINERS: drop myself as maintainer
drm/i915/display: Fix u32 overflow in SNPS PHY HDMI PLL setup
drm/amd/display: Fix default DC and AC levels
...
|
|
Set DIS_NULL_QUERY bit of RT_CTRL register to disable
null query for anyhit shader for Xe3 IP.
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Nitin Gote <nitin.r.gote@intel.com>
Link: https://lore.kernel.org/r/20250605100812.2547808-1-nitin.r.gote@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
|
|
There is unmatched xe_vm_unlock() in the __xe_exec_queue_init().
Leftover from commit fbeaad071a98 ("drm/xe: Create LRC BO without VM")
Fixes: 2b0a0ce0c20b ("drm/xe: Create LRC BO without VM")
Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Link: https://lore.kernel.org/r/20250530135627.2821612-1-maciej.patelczyk@intel.com
(cherry picked from commit 28b996ce73982a44fa86736ca0e3684cb1ae8b24)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
Specifying VM during lrc->bo creation requires VM's reference
to be held for the lifetime of lrc->bo as it will use VM's dma
reservation object. Using VM's dma reservation object for
lrc->bo doesn't provide any advantage. Hence do not pass VM
while creating lrc->bo.
v2: Use xe_bo_unpin_map_no_vm (Matthew Brost)
Fixes: 264eecdba211 ("drm/xe: Decouple xe_exec_queue and xe_lrc")
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250529052031.2429120-2-niranjana.vishwanathapura@intel.com
(cherry picked from commit fbeaad071a98fef87deccee81d564de1c8e8e16d)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
Daniele noticed that the fix in commit 2d2be279f1ca ("drm/xe: fix UAF
around queue destruction") looks to have been unintentionally removed as
part of handling a conflict in some past merge commit. Add it back.
Fixes: ac44ff7cec33 ("Merge tag 'drm-xe-fixes-2024-10-10' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes")
Reported-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <stable@vger.kernel.org> # v6.12+
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250603174213.1543579-2-matthew.auld@intel.com
(cherry picked from commit 9d9fca62dc49d96f97045b6d8e7402a95f8cf92a)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
The expected flow of operations when using PXP is to query the PXP
status and wait for it to transition to "ready" before attempting to
create an exec_queue. This flow is followed by the Mesa driver, but
there is no guarantee that an incorrectly coded (or malicious) app
will not attempt to create the queue first without querying the status.
Therefore, we need to clarify what the expected behavior of the queue
creation ioctl is in this scenario.
Currently, the ioctl always fails with an -EBUSY code no matter the
error, but for consistency it is better to distinguish between "failed
to init" (-EIO) and "not ready" (-EBUSY), the same way the query ioctl
does. Note that, while this is a change in the return code of an ioctl,
the behavior of the ioctl in this particular corner case was not clearly
spec'd, so no one should have been relying on it (and we know that Mesa,
which is the only known userspace for this, didn't).
v2: Minor rework of the doc (Rodrigo)
Fixes: 72d479601d67 ("drm/xe/pxp/uapi: Add userspace and LRC support for PXP-using queues")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250522225401.3953243-7-daniele.ceraolospurio@intel.com
(cherry picked from commit 21784ca96025b62d95b670b7639ad70ddafa69b8)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
The define of the extension type was accidentally used instead of the
one of the property itself. They're both zero, so no functional issue,
but we should use the correct define for code correctness.
Fixes: 41a97c4a1294 ("drm/xe/pxp/uapi: Add API to mark a BO as using PXP")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://lore.kernel.org/r/20250522225401.3953243-6-daniele.ceraolospurio@intel.com
(cherry picked from commit 1d891ee820fd0fbb4101eacb0d922b5050a24933)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
Customer is reporting a really subtle issue where we get random DMAR
faults, hangs and other nasties for kernel migration jobs when stressing
stuff like s2idle/s3/s4. The explosions seems to happen somewhere
after resuming the system with splats looking something like:
PM: suspend exit
rfkill: input handler disabled
xe 0000:00:02.0: [drm] GT0: Engine reset: engine_class=bcs, logical_mask: 0x2, guc_id=0
xe 0000:00:02.0: [drm] GT0: Timedout job: seqno=24496, lrc_seqno=24496, guc_id=0, flags=0x13 in no process [-1]
xe 0000:00:02.0: [drm] GT0: Kernel-submitted job timed out
The likely cause appears to be a race between suspend cancelling the
worker that processes the free_job()'s, such that we still have pending
jobs to be freed after the cancel. Following from this, on resume the
pending_list will now contain at least one already complete job, but it
looks like we call drm_sched_resubmit_jobs(), which will then call
run_job() on everything still on the pending_list. But if the job was
already complete, then all the resources tied to the job, like the bb
itself, any memory that is being accessed, the iommu mappings etc. might
be long gone since those are usually tied to the fence signalling.
This scenario can be seen in ftrace when running a slightly modified
xe_pm IGT (kernel was only modified to inject artificial latency into
free_job to make the race easier to hit):
xe_sched_job_run: dev=0000:00:02.0, fence=0xffff888276cc8540, seqno=0, lrc_seqno=0, gt=0, guc_id=0, batch_addr=0x000000146910 ...
xe_exec_queue_stop: dev=0000:00:02.0, 3:0x2, gt=0, width=1, guc_id=0, guc_state=0x0, flags=0x13
xe_exec_queue_stop: dev=0000:00:02.0, 3:0x2, gt=0, width=1, guc_id=1, guc_state=0x0, flags=0x4
xe_exec_queue_stop: dev=0000:00:02.0, 4:0x1, gt=1, width=1, guc_id=0, guc_state=0x0, flags=0x3
xe_exec_queue_stop: dev=0000:00:02.0, 1:0x1, gt=1, width=1, guc_id=1, guc_state=0x0, flags=0x3
xe_exec_queue_stop: dev=0000:00:02.0, 4:0x1, gt=1, width=1, guc_id=2, guc_state=0x0, flags=0x3
xe_exec_queue_resubmit: dev=0000:00:02.0, 3:0x2, gt=0, width=1, guc_id=0, guc_state=0x0, flags=0x13
xe_sched_job_run: dev=0000:00:02.0, fence=0xffff888276cc8540, seqno=0, lrc_seqno=0, gt=0, guc_id=0, batch_addr=0x000000146910 ...
.....
xe_exec_queue_memory_cat_error: dev=0000:00:02.0, 3:0x2, gt=0, width=1, guc_id=0, guc_state=0x3, flags=0x13
So the job_run() is clearly triggered twice for the same job, even
though the first must have already signalled to completion during
suspend. We can also see a CAT error after the re-submit.
To prevent this only resubmit jobs on the pending_list that have not yet
signalled.
v2:
- Make sure to re-arm the fence callbacks with sched_start().
v3 (Matt B):
- Stop using drm_sched_resubmit_jobs(), which appears to be deprecated
and just open-code a simple loop such that we skip calling run_job()
on anything already signalled.
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4856
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: William Tseng <william.tseng@intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Link: https://lore.kernel.org/r/20250528113328.289392-2-matthew.auld@intel.com
(cherry picked from commit 38fafa9f392f3110d2de431432d43f4eef99cd1b)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
For preempt_fence mode VM's we're rejecting eviction of
shared bos during VM_BIND. However, since we do this in the
move() callback, we're getting an eviction failure warning from
TTM. The TTM callback intended for these things is
eviction_valuable().
However, the latter doesn't pass in the struct ttm_operation_ctx
needed to determine whether the caller needs this.
Instead, attach the needed information to the vm under the
vm->resv, until we've been able to update TTM to provide the
needed information. And add sufficient lockdep checks to prevent
misuse and races.
v2:
- Fix a copy-paste error in xe_vm_clear_validating()
v3:
- Fix kerneldoc errors.
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Fixes: 0af944f0e308 ("drm/xe: Reject BO eviction if BO is bound to current VM")
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250528164105.234718-1-thomas.hellstrom@linux.intel.com
(cherry picked from commit 9d5558649f68e2e84a87a909631b30e15ca0f8ec)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
The XE driver can be built with or without VSEC support, but fails to link as
built-in if vsec is in a loadable module:
x86_64-linux-ld: vmlinux.o: in function `xe_vsec_init':
(.text+0x1e83e16): undefined reference to `intel_vsec_register'
The normal fix for this is to add a 'depends on INTEL_VSEC || !INTEL_VSEC',
forcing XE to be a loadable module as well, but that causes a circular
dependency:
symbol DRM_XE depends on INTEL_VSEC
symbol INTEL_VSEC depends on X86_PLATFORM_DEVICES
symbol X86_PLATFORM_DEVICES is selected by DRM_XE
The problem here is selecting a symbol from another subsystem, so change
that as well and rephrase the 'select' into the corresponding dependency.
Since X86_PLATFORM_DEVICES is 'default y', there is no change to
defconfig builds here.
Fixes: 0c45e76fcc62 ("drm/xe/vsec: Support BMG devices")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250529172355.2395634-2-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit e4931f8be347ec5f19df4d6d33aea37145378c42)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|