Age | Commit message (Collapse) | Author | Files | Lines |
|
Unlike the Xe_HP platforms, MTL only has a single CCS engine; the
quad-based engine masking logic does not apply to this platform (or
presumably any future platforms that only have 0 or 1 CCS).
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220818234202.451742-5-radhakrishna.sripada@intel.com
|
|
Host Turbo operates at efficient frequency when GT is not idle unless
the user or workload has forced it to a higher level. Replicate the same
behavior in SLPC by allowing the algorithm to use efficient frequency.
We had disabled it during boot due to concerns that it might break
kernel ABI for min frequency. However, this is not the case since
SLPC will still abide by the (min,max) range limits.
With this change, min freq will be at efficient frequency level at init
instead of fused min (RPn). If user chooses to reduce min freq below the
efficient freq, we will turn off usage of efficient frequency and honor
the user request. When a higher value is written, it will get toggled
back again.
The patch also corrects the register which needs to be read for obtaining
the correct efficient frequency for Gen9+.
We see much better perf numbers with benchmarks like glmark2 with
efficient frequency usage enabled as expected.
v2: Address review comments (Rodrigo)
v3: with efficient frequency being dynamic, it is possible that the req
frequency may go beyond max freq. This will cause SLPC selftests to fail.
Add a FIXME there to start the test with [RPn, RP0] instead and restore
it afterwards.
BugLink: https://gitlab.freedesktop.org/drm/intel/-/issues/5468
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220820010832.15350-1-vinay.belgaumkar@intel.com
|
|
If it's modified runtime, it's runtime info.
mock_gem_device() is the only one that modifies it. If that could be
fixed, we wouldn't have to do this.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Maarten Lankhort <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1261406b373998c1a171ee9ed91f7f562826eba6.1660910433.git.jani.nikula@intel.com
|
|
If it's modified runtime, it's runtime info.
Curiously, the flag was never initialized statically.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Maarten Lankhort <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/db6d47abd87c74ae5f5be1cda62af13518c896fb.1660910433.git.jani.nikula@intel.com
|
|
If it's modified runtime, it's runtime info.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Maarten Lankhort <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/9a9b94cb79a00229da5a564a16ea750a6d392ab6.1660910433.git.jani.nikula@intel.com
|
|
Commit 368d179adbac ("drm/i915/guc: Add GuC <-> kernel time stamp
translation information") added intel_device_info_print_runtime() in the
time info dump for no obvious reason or explanation in the commit
message. It only logs the rawclk freq. Remove it.
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/b395ac4c909042f5daabf29959d8733993545aa2.1660910433.git.jani.nikula@intel.com
|
|
If the GuC CTs are full and we need to stall the request submission
while waiting for space, we save the stalled request and where the stall
occurred; when the CTs have space again we pick up the request submission
from where we left off.
If a full GT reset occurs, the state of all contexts is cleared and all
non-guilty requests are unsubmitted, therefore we need to restart the
stalled request submission from scratch. To make sure that we do so,
clear the saved request after a reset.
Fixes note: the patch that introduced the bug is in 5.15, but no
officially supported platform had GuC submission enabled by default
in that kernel, so the backport to that particular version (and only
that one) can potentially be skipped.
Fixes: 925dc1cf58ed ("drm/i915/guc: Implement GuC submission tasklet")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: John Harrison <john.c.harrison@intel.com>
Cc: <stable@vger.kernel.org> # v5.15+
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220811210812.3239621-1-daniele.ceraolospurio@intel.com
(cherry picked from commit f922fbb0f2ad1fd3e3186f39c46673419e6d9281)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Crucible + recent Mesa seems to sometimes hit:
GEM_BUG_ON(num_ccs_blks > NUM_CCS_BLKS_PER_XFER)
And it looks like we can also trigger this with gem_lmem_swapping, if we
modify the test to use slightly larger object sizes.
Looking closer it looks like we have the following issues in
migrate_copy():
- We are using plain integer in various places, which we can easily
overflow with a large object.
- We pass the entire object size (when the src is lmem) into
emit_pte() and then try to copy it, which doesn't work, since we
only have a few fixed sized windows in which to map the pages and
perform the copy. With an object > 8M we therefore aren't properly
copying the pages. And then with an object > 64M we trigger the
GEM_BUG_ON(num_ccs_blks > NUM_CCS_BLKS_PER_XFER).
So it looks like our copy handling for any object > 8M (which is our
CHUNK_SZ) is currently broken on DG2.
Fixes: da0595ae91da ("drm/i915/migrate: Evict and restore the flatccs capable lmem obj")
Testcase: igt@gem_lmem_swapping
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Ramalingam C<ramalingam.c@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220805132240.442747-2-matthew.auld@intel.com
(cherry picked from commit 8676145eb2f53a9940ff70910caf0125bd8a4bc2)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
to zero"
This reverts commit 6a079903847cce1dd06345127d2a32f26d2cd9c6.
Everything in CI using GuC is now timing out[1], and killing the machine
with this change (perhaps a deadlock?). CI was recently on fire due to
some changes coming in from -rc1, so likely the pre-merge CI results for
this series were invalid? For now just revert, unless GuC experts
already have a fix in mind.
[1] https://intel-gfx-ci.01.org/tree/drm-tip/index.html?
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220819123904.913750-1-matthew.auld@intel.com
|
|
Add a delay, configurable via debugfs (default 34ms), to disable
scheduling of a context after the pin count goes to zero. Disable
scheduling is a costly operation as it requires synchronizing with
the GuC. So the idea is that a delay allows the user to resubmit
something before doing this operation. This delay is only done if
the context isn't closed and less than a given threshold
(default is 3/4) of the guc_ids are in use.
As temporary WA disable this feature for the selftests. Selftests are
very timing sensitive and any change in timing can cause failure. A
follow up patch will fixup the selftests to understand this delay.
Alan Previn: Matt Brost first introduced this series back in Oct 2021.
However no real world workload with measured performance impact was
available to prove the intended results. Today, this series is being
republished in response to a real world workload that benefited greatly
from it along with measured performance improvement.
Workload description: 36 containers were created on a DG2 device where
each container was performing a combination of 720p 3d game rendering
and 30fps video encoding. The workload density was configured in a way
that guaranteed each container to ALWAYS be able to render and
encode no less than 30fps with a predefined maximum render + encode
latency time. That means the totality of all 36 containers and their
workloads were not saturating the engines to their max (in order to
maintain just enough headrooom to meet the min fps and max latencies
of incoming container submissions).
Problem statement: It was observed that the CPU core processing the i915
soft IRQ work was experiencing severe load. Using tracelogs and an
instrumentation patch to count specific i915 IRQ events, it was confirmed
that the majority of the CPU cycles were caused by the
gen11_other_irq_handler() -> guc_irq_handler() code path. The vast
majority of the cycles was determined to be processing a specific G2H
IRQ: i.e. INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE. These IRQs are sent
by GuC in response to i915 KMD sending H2G requests:
INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET. Those H2G requests are sent
whenever a context goes idle so that we can unpin the context from GuC.
The high CPU utilization % symptom was limiting density scaling.
Root Cause Analysis: Because the incoming execution buffers were spread
across 36 different containers (each with multiple contexts) but the
system in totality was NOT saturated to the max, it was assumed that each
context was constantly idling between submissions. This was causing
a thrashing of unpinning contexts from GuC at one moment, followed quickly
by repinning them due to incoming workload the very next moment. These
event-pairs were being triggered across multiple contexts per container,
across all containers at the rate of > 30 times per sec per context.
Metrics: When running this workload without this patch, we measured an
average of ~69K INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE events every 10
seconds or ~10 million times over ~25+ mins. With this patch, the count
reduced to ~480 every 10 seconds or about ~28K over ~10 mins. The
improvement observed is ~99% for the average counts per 10 seconds.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220817020511.2180747-3-alan.previn.teres.alexis@intel.com
|
|
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
- disable pci resize on 32-bit systems (Nirmoy)
- don't leak the ccs state (Matt)
- TLB invalidation fixes (Chris)
[now with all fixes of fixes]
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/YvVumNCga+90fYN0@intel.com
|
|
If the GuC CTs are full and we need to stall the request submission
while waiting for space, we save the stalled request and where the stall
occurred; when the CTs have space again we pick up the request submission
from where we left off.
If a full GT reset occurs, the state of all contexts is cleared and all
non-guilty requests are unsubmitted, therefore we need to restart the
stalled request submission from scratch. To make sure that we do so,
clear the saved request after a reset.
Fixes note: the patch that introduced the bug is in 5.15, but no
officially supported platform had GuC submission enabled by default
in that kernel, so the backport to that particular version (and only
that one) can potentially be skipped.
Fixes: 925dc1cf58ed ("drm/i915/guc: Implement GuC submission tasklet")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: John Harrison <john.c.harrison@intel.com>
Cc: <stable@vger.kernel.org> # v5.15+
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220811210812.3239621-1-daniele.ceraolospurio@intel.com
|
|
The test needs GT reset to trigger the scrubbing logic, so we can only
run it when reset is enabled.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <john.c.harrison@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220708224158.929327-1-daniele.ceraolospurio@intel.com
|
|
Some debug code got left in when the GuC based register save for error
capture was added. Remove that.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220728022028.2190627-8-John.C.Harrison@Intel.com
|
|
The GuC log buffer sizes had to be configured statically at compile
time. This can be quite troublesome when needing to get larger logs
out of a released driver. So re-organise the code to allow a boot time
module parameter override.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220728022028.2190627-7-John.C.Harrison@Intel.com
|
|
Use a temporary page and mempy_from_wc to reduce the time it takes to
dump the guc log to debugfs.
Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220728022028.2190627-6-John.C.Harrison@Intel.com
|
|
It is useful to be able to match GuC events to kernel events when
looking at the GuC log. That requires being able to convert GuC
timestamps to kernel time. So, when dumping error captures and/or GuC
logs, include a stamp in both time zones plus the clock frequency.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220728022028.2190627-4-John.C.Harrison@Intel.com
|
|
There was a size check to warn if the GuC error state capture buffer
allocation would be too small to fit a reasonable amount of capture
data for the current platform. Unfortunately, the test was done too
early in the boot sequence and was actually testing 'if(-ENODEV >
size)'.
Move the check to be later. The check is only used to print a warning
message, so it doesn't really matter how early or late it is done.
Note that it is not possible to dynamically size the buffer because
the allocation needs to be done before the engine information is
available (at least, it would be in the intended two-phase GuC init
process).
Now that the check works, it is reporting size too small for newer
platforms. The check includes a 3x oversample multiplier to allow for
multiple error captures to be bufferd by GuC before i915 has a chance
to read them out. This is less important than simply being big enough
to fit the first capture.
So a) bump the default size to be large enough for one capture minimum
and b) make the warning only if one capture won't fit, instead use a
notice for the 3x size.
Note that the size estimate is a worst case scenario. Actual captures
will likely be smaller.
Lastly, use drm_warn istead of DRM_WARN as the former provides more
infmration and the latter is deprecated.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220728022028.2190627-3-John.C.Harrison@Intel.com
|
|
Add a helper to get GuC log buffer size.
Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220728022028.2190627-2-John.C.Harrison@Intel.com
|
|
Some additional MMIO tuning settings have appeared in the bspec's
performance tuning guide section.
One of the tuning settings here is also documented as formal workaround
Wa_22012654132 for some steppings of DG2. However the tuning setting
applies to all DG2 variants and steppings, making it a superset of the
workaround.
v2:
- Move DRAW_WATERMARK to engine workaround section. It only moves into
the engine context on future platforms. (Lucas)
- CHICKEN_RASTER_2 needs to be handled as a masked register. (Lucas)
Bspec: 68331
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220816210601.2041572-2-matthew.d.roper@intel.com
|
|
The bspec performance tuning section gives recommended settings that the
driver should program for various MMIO registers. Although these
settings aren't "workarounds" we use the workaround infrastructure to do
this programming to make sure it is handled at the appropriate places
and doesn't conflict with any real workarounds.
Since more of these are starting to show up on recent platforms, it's a
good time to create a dedicated function to hold them so that there's
less ambiguity about how/where to implement new ones.
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220816210601.2041572-1-matthew.d.roper@intel.com
|
|
For proper operation of i915 we need usable PCI GTTMMADDR BAR 0
(1 for GEN2). In most cases we also need usable PCI GFXMEM BAR 2.
Let's add functions to check if BARs are set, and that it have
a size greater than 0.
In case GTTMMADDR BAR, let's validate at the beginning of i915
initialization.
For other BARs, let's validate before first use.
Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220805155959.1983584-3-piotr.piorkowski@intel.com
|
|
At the moment, when we refer to some PCI BAR we use the number of
this BAR in the code. The meaning of BARs between different platforms
may be different. Therefore, in order to organize the code,
let's start using defined names instead of numbers.
v2: Add lost header in cfg_space.c
Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220805155959.1983584-2-piotr.piorkowski@intel.com
|
|
Crucible + recent Mesa seems to sometimes hit:
GEM_BUG_ON(num_ccs_blks > NUM_CCS_BLKS_PER_XFER)
And it looks like we can also trigger this with gem_lmem_swapping, if we
modify the test to use slightly larger object sizes.
Looking closer it looks like we have the following issues in
migrate_copy():
- We are using plain integer in various places, which we can easily
overflow with a large object.
- We pass the entire object size (when the src is lmem) into
emit_pte() and then try to copy it, which doesn't work, since we
only have a few fixed sized windows in which to map the pages and
perform the copy. With an object > 8M we therefore aren't properly
copying the pages. And then with an object > 64M we trigger the
GEM_BUG_ON(num_ccs_blks > NUM_CCS_BLKS_PER_XFER).
So it looks like our copy handling for any object > 8M (which is our
CHUNK_SZ) is currently broken on DG2.
Fixes: da0595ae91da ("drm/i915/migrate: Evict and restore the flatccs capable lmem obj")
Testcase: igt@gem_lmem_swapping
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Ramalingam C<ramalingam.c@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220805132240.442747-2-matthew.auld@intel.com
|
|
We only ever need to emit one ccs block copy command.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Ramalingam C<ramalingam.c@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220805132240.442747-1-matthew.auld@intel.com
|
|
WRITE_ONCE() should happen at the original var, not on a local
copy of it.
Cc: stable@vger.kernel.org
Fixes: 59eda6ce824e ("drm/i915/gt: Batch TLB invalidations")
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[added cc-stable while merging it]
Link: https://patchwork.freedesktop.org/patch/msgid/f9550e6bacea10131ff40dd8981b69eb9251cdcd.1659598090.git.mchehab@kernel.org
(cherry picked from commit 3d037d99e61a1e7a3ae3d214146d88db349dd19f)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Invalidate TLB in batches, in order to reduce performance regressions.
Currently, every caller performs a full barrier around a TLB
invalidation, ignoring all other invalidations that may have already
removed their PTEs from the cache. As this is a synchronous operation
and can be quite slow, we cause multiple threads to contend on the TLB
invalidate mutex blocking userspace.
We only need to invalidate the TLB once after replacing our PTE to
ensure that there is no possible continued access to the physical
address before releasing our pages. By tracking a seqno for each full
TLB invalidate we can quickly determine if one has been performed since
rewriting the PTE, and only if necessary trigger one for ourselves.
That helps to reduce the performance regression introduced by TLB
invalidate logic.
[mchehab: rebased to not require moving the code to a separate file]
Cc: stable@vger.kernel.org
Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/4e97ef5deb6739cadaaf40aa45620547e9c4ec06.1658924372.git.mchehab@kernel.org
(cherry picked from commit 5d36acb7198b0e5eb88e6b701f9ad7b9448f8df9)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
WRITE_ONCE() should happen at the original var, not on a local
copy of it.
Cc: stable@vger.kernel.org
Fixes: 5d36acb7198b ("drm/i915/gt: Batch TLB invalidations")
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[added cc-stable while merging it]
Link: https://patchwork.freedesktop.org/patch/msgid/f9550e6bacea10131ff40dd8981b69eb9251cdcd.1659598090.git.mchehab@kernel.org
|
|
Skip all further TLB invalidations once the device is wedged and
had been reset, as, on such cases, it can no longer process instructions
on the GPU and the user no longer has access to the TLB's in each engine.
So, an attempt to do a TLB cache invalidation will produce a timeout.
That helps to reduce the performance regression introduced by TLB
invalidate logic.
Cc: stable@vger.kernel.org
Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/5aa86564b9ec5fe7fe605c1dd7de76855401ed73.1658924372.git.mchehab@kernel.org
(cherry picked from commit be0366f168033374a93e4c43fdaa1a90ab905184)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Ensure that the TLB of the OA unit is also invalidated
on gen12 HW, as just invalidating the TLB of an engine is not
enough.
Cc: stable@vger.kernel.org
Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/59724d9f5cf1e93b1620d01b8332ac991555283d.1658924372.git.mchehab@kernel.org
(cherry picked from commit dfc83de118ff7930acc9a4c8dfdba7c153aa44d6)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Check if the device is powered down prior to any engine activity,
as, on such cases, all the TLBs were already invalidated, so an
explicit TLB invalidation is not needed, thus reducing the
performance regression impact due to it.
This becomes more significant with GuC, as it can only do so when
the connection to the GuC is awake.
Cc: stable@vger.kernel.org
Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/278a57a672edac75683f0818b292e95da583a5fe.1658924372.git.mchehab@kernel.org
(cherry picked from commit 4bedceaed1ae1172cfe72d3ff752b3a1d32fe4d9)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
The kernel only manages the ccs state with lmem-only objects, however
the kernel should still take care not to leak the CCS state from the
previous user.
Fixes: 48760ffe923a ("drm/i915/gt: Clear compress metadata for Flat-ccs objects")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Ramalingam C <ramalingam.c@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220727164346.282407-1-matthew.auld@intel.com
(cherry picked from commit 353819d85f87be46aeb9c1dd929d445a006fc6ec)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
PCI bar resize only works with 64 bit BAR so disable
this on 32-bit machine and resolve below compilation error:
drivers/gpu/drm/i915/gt/intel_region_lmem.c:94:23: error: result of
comparison of constant 4294967296 with expression of type
'resource_size_t' (aka 'unsigned int') is always false
[-Werror,-Wtautological-constant-out-of-range-compare]
root_res->start > 0x100000000ull)
Fixes: a91d1a17cd341 ("drm/i915: Add support for LMEM PCIe resizable bar")
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Acked-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220727173306.16247-1-nirmoy.das@intel.com
(cherry picked from commit f5dfbfc0ae00c2c2c0518da9e1f9a8cca50ae544)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Pull drm updates from Dave Airlie:
"Highlights:
- New driver for logicvc - which is a display IP core.
- EDID parser rework to add new extensions
- fbcon scrolling improvements
- i915 has some more DG2 work but not enabled by default, but should
have enough features for userspace to work now.
Otherwise it's lots of work all over the place. Detailed summary:
New driver:
- logicvc
vfio:
- use aperture API
core:
- of: Add data-lane helpers and convert drivers
- connector: Remove deprecated ida_simple_get()
media:
- Add various RGB666 and RGB888 format constants
panel:
- Add HannStar HSD101PWW
- Add ETML0700Y5DHA
dma-buf:
- add sync-file API
- set dma mask for udmabuf devices
fbcon:
- Improve scrolling performance
- Sanitize input
fbdev:
- device unregistering fixes
- vesa: Support COMPILE_TEST
- Disable firmware-device registration when first native driver loads
aperture:
- fix segfault during hot-unplug
- export for use with other subsystems
client:
- use driver validated modes
dp:
- aux: make probing more reliable
- mst: Read extended DPCD capabilities during system resume
- Support waiting for HDP signal
- Port-validation fixes
edid:
- CEA data-block iterators
- struct drm_edid introduction
- implement HF-EEODB extension
gem:
- don't use fb format non-existing planes
probe-helper:
- use 640x480 as displayport fallback
scheduler:
- don't kill jobs in interrupt context
bridge:
- Add support for i.MX8qxp and i.MX8qm
- lots of fixes/cleanups
- Add TI-DLPC3433
- fy07024di26a30d: Optional GPIO reset
- ldb: Add reg and reg-name properties to bindings, Kconfig fixes
- lt9611: Fix display sensing;
- tc358767: DSI/DPI refactoring and DSI-to-eDP support, DSI lane handling
- tc358775: Fix clock settings
- ti-sn65dsi83: Allow GPIO to sleep
- adv7511: I2C fixes
- anx7625: Fix error handling; DPI fixes; Implement HDP timeout via callback
- fsl-ldb: Drop DE flip
- ti-sn65dsi86: Convert to atomic modesetting
amdgpu:
- use atomic fence helpers in DM
- fix VRAM address calculations
- export CRTC bpc via debugfs
- Initial devcoredump support
- Enable high priority gfx queue on asics which support it
- Adjust GART size on newer APUs for S/G display
- Soft reset for GFX 11 / SDMA 6
- Add gfxoff status query for vangogh
- Fix timestamps for cursor only commits
- Adjust GART size on newer APUs for S/G display
- fix buddy memory corruption
amdkfd:
- MMU notifier fixes
- P2P DMA support using dma-buf
- Add available memory IOCTL
- HMM profiler support
- Simplify GPUVM validation
- Unified memory for CWSR save/restore area
i915:
- General driver clean-up
- DG2 enabling (still under force probe)
- DG2 small BAR memory support
- HuC loading support
- DG2 workarounds
- DG2/ATS-M device IDs added
- Ponte Vecchio prep work and new blitter engines
- add Meteorlake support
- Fix sparse warnings
- DMC MMIO range checks
- Audio related fixes
- Runtime PM fixes
- PSR fixes
- Media freq factor and per-gt enhancements
- DSI fixes for ICL+
- Disable DMC flip queue handlers
- ADL_P voltage swing updates
- Use more the VBT for panel information
- Fix on Type-C ports with TBT mode
- Improve fastset and allow seamless M/N changes
- Accept more fixed modes with VRR/DMRRS panels
- Disable connector polling for a headless SKU
- ADL-S display PLL w/a
- Enable THP on Icelake and beyond
- Fix i915_gem_object_ggtt_pin_ww regression on old platforms
- Expose per tile media freq factor in sysfs
- Fix dma_resv fence handling in multi-batch execbuf
- Improve on suspend / resume time with VT-d enabled
- export CRTC bpc settings via debugfs
msm:
- gpu: a619 support
- gpu: Fix for unclocked GMU register access
- gpu: Devcore dump enhancements
- client utilization via fdinfo support
- fix fence rollover issue
- gem: Lockdep false-positive warning fix
- gem: Switch to pfn mappings
- WB support on sc7180
- dp: dropped custom bulk clock implementation
- fix link retraining on resolution change
- hdmi: dropped obsolete GPIO support
tegra:
- context isolation for host1x engines
- tegra234 soc support
mediatek:
- add vdosys0/1 for mt8195
- add MT8195 dp_intf driver
exynos:
- Fix resume function issue of exynos decon driver by calling
clk_disable_unprepare() properly if clk_prepare_enable() failed.
nouveau:
- set of misc fixes/cleanups
- display cleanups
gma500:
- Cleanup connector I2C handling
hyperv:
- Unify VRAM allocation of Gen1 and Gen2
meson:
- Support YUV422 output; Refcount fixes
mgag200:
- Support damage clipping
- Support gamma handling
- Protect concurrent HW access
- Fixes to connector
- Store model-specific limits in device-info structure
- fix PCI register init
panfrost:
- Valhall support
r128:
- Fix bit-shift overflow
rockchip:
- Locking fixes in error path
ssd130x:
- Fix built-in linkage
udl:
- Always advertize VGA connector
ast:
- Support multiple outputs
- fix black screen on resume
sun4i:
- HDMI PHY cleanups
vc4:
- Add support for BCM2711
vkms:
- Allocate output buffer with vmalloc()
mcde:
- Fix ref-count leak
mxsfb/lcdif:
- Support i.MX8MP LCD controller
stm/ltdc:
- Support dynamic Z order
- Support mirroring
ingenic:
- Fix display at maximum resolution"
* tag 'drm-next-2022-08-03' of git://anongit.freedesktop.org/drm/drm: (1480 commits)
drm/amd/display: Fix a compilation failure on PowerPC caused by FPU code
drm/amdgpu: enable support for psp 13.0.4 block
drm/amdgpu: add files for PSP 13.0.4
drm/amdgpu: add header files for MP 13.0.4
drm/amdgpu: correct RLC_RLCS_BOOTLOAD_STATUS offset and index
drm/amdgpu: send msg to IMU for the front-door loading
drm/amdkfd: use time_is_before_jiffies(a + b) to replace "jiffies - a > b"
drm/amdgpu: fix hive reference leak when reflecting psp topology info
drm/amd/pm: enable GFX ULV feature support for SMU13.0.0
drm/amd/pm: update driver if header for SMU 13.0.0
drm/amdgpu: move mes self test after drm sched re-started
drm/amdgpu: drop non-necessary call trace dump
drm/amdgpu: enable VCN cg and JPEG cg/pg
drm/amdgpu: vcn_4_0_2 video codec query
drm/amdgpu: add VCN_4_0_2 firmware support
drm/amdgpu: add VCN function in NBIO v7.7
drm/amdgpu: fix a vcn4 boot poll bug in emulation mode
drm/amd/amdgpu: add memory training support for PSP_V13
drm/amdkfd: remove an unnecessary amdgpu_bo_ref
drm/amd/pm: Add get_gfx_off_status interface for yellow carp
...
|
|
Bspec: 46052
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220801213839.8549-1-harish.chegondi@intel.com
|
|
New release of GuC with a bunch of fixes specific to DG2. Some of
these require follow up i915 changes to enable.
Note also that it is not necessary to maintain backwards compatibility
with 70.1.2 for DG2 because DG2 is still under force probe protection.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220728230722.2749701-2-John.C.Harrison@Intel.com
|
|
The GuC FW applies the parent context policy to all the children,
so individual updates to the children are not supported and we
should not send them.
Note that sending the message did not have any functional consequences,
because the GuC just drops it and logs an error; since we were trying
to set the child policy to match the parent anyway the message being
dropped was not a problem.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <john.c.harrison@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220728003339.2361010-1-daniele.ceraolospurio@intel.com
|
|
When the KMD sends a CLIENT_RESET request to GuC (as part of the
suspend sequence), GuC will mark the CTB buffer as 'UNUSED'. If the
KMD then checked the CTB queue, it would see a non-zero status value
and report the buffer as corrupted.
Technically, no G2H messages should be received once the CLIENT_RESET
has been sent. However, if a context was outstanding on an engine then
it would get reset and a reset notification would be sent. So, don't
actually treat UNUSED as a catastrophic error. Just flag it up as
unexpected and keep going.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220728024225.2363663-7-John.C.Harrison@Intel.com
|
|
The GuC needs a copy of a golden context for implementing watchdog
resets (aka media resets). This context is larger on newer platforms.
So adjust the size being allocated/copied accordingly.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220728024225.2363663-6-John.C.Harrison@Intel.com
|
|
It is no longer guaranteed that there will always be an RCS engine.
So, use the helper function for finding the first available engine that
can be used for general purpose selftets.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220728024225.2363663-5-John.C.Harrison@Intel.com
|
|
Add a test to check that the hangcheck will recover from a submission
hang in the GuC.
Signed-off-by: Rahul Kumar Singh <rahul.kumar.singh@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220728182616.2417491-1-John.C.Harrison@Intel.com
|
|
Having semaphores results in different behavior when a dependent request
is cancelled. In the case of semaphores the request could be on the HW
and complete successfully while without the request is held in the
driver and the error from the dependent request is propagated. Fix
live_preempt_cancel to take this behavior into account.
Also update live_preempt_cancel to use new function intel_context_ban
rather than intel_context_set_banned.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220728024225.2363663-3-John.C.Harrison@Intel.com
|
|
In GuC submission mode, there is an option to use auto-switch out
semaphores and have GuC auto-switch in a waiting context. This
requires routing the semaphore interrupt to GuC.
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220728024225.2363663-2-John.C.Harrison@Intel.com
|
|
We are seeing error message of "No response for request". Some cases
happened while waiting for response and reset/suspend action was triggered.
In this case, no response is not an error, active requests will be
cancelled.
This patch will handle this condition and change the error message into
debug message.
Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220715211313.143645-1-zhanjun.dong@intel.com
|
|
Invalidate TLB in batches, in order to reduce performance regressions.
Currently, every caller performs a full barrier around a TLB
invalidation, ignoring all other invalidations that may have already
removed their PTEs from the cache. As this is a synchronous operation
and can be quite slow, we cause multiple threads to contend on the TLB
invalidate mutex blocking userspace.
We only need to invalidate the TLB once after replacing our PTE to
ensure that there is no possible continued access to the physical
address before releasing our pages. By tracking a seqno for each full
TLB invalidate we can quickly determine if one has been performed since
rewriting the PTE, and only if necessary trigger one for ourselves.
That helps to reduce the performance regression introduced by TLB
invalidate logic.
[mchehab: rebased to not require moving the code to a separate file]
Cc: stable@vger.kernel.org
Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/4e97ef5deb6739cadaaf40aa45620547e9c4ec06.1658924372.git.mchehab@kernel.org
|
|
Skip all further TLB invalidations once the device is wedged and
had been reset, as, on such cases, it can no longer process instructions
on the GPU and the user no longer has access to the TLB's in each engine.
So, an attempt to do a TLB cache invalidation will produce a timeout.
That helps to reduce the performance regression introduced by TLB
invalidate logic.
Cc: stable@vger.kernel.org
Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/5aa86564b9ec5fe7fe605c1dd7de76855401ed73.1658924372.git.mchehab@kernel.org
|
|
Ensure that the TLB of the OA unit is also invalidated
on gen12 HW, as just invalidating the TLB of an engine is not
enough.
Cc: stable@vger.kernel.org
Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/59724d9f5cf1e93b1620d01b8332ac991555283d.1658924372.git.mchehab@kernel.org
|
|
Add a kernel-doc markup to document this new macro.
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/b974905bd0f6b5308b91561cc85eeecd94f1452a.1658924372.git.mchehab@kernel.org
|
|
Check if the device is powered down prior to any engine activity,
as, on such cases, all the TLBs were already invalidated, so an
explicit TLB invalidation is not needed, thus reducing the
performance regression impact due to it.
This becomes more significant with GuC, as it can only do so when
the connection to the GuC is awake.
Cc: stable@vger.kernel.org
Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/278a57a672edac75683f0818b292e95da583a5fe.1658924372.git.mchehab@kernel.org
|
|
The kernel only manages the ccs state with lmem-only objects, however
the kernel should still take care not to leak the CCS state from the
previous user.
Fixes: 48760ffe923a ("drm/i915/gt: Clear compress metadata for Flat-ccs objects")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Ramalingam C <ramalingam.c@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220727164346.282407-1-matthew.auld@intel.com
|