kernel/linux.git/drivers/gpu/drm/i915/intel_ringbuffer.c, branch v4.9.166

drm/i915: Remove WaDisableLSQCROPERFforOCL KBL workaround.

2017-02-01T07:33:14+00:00

commit 4fc020d864647ea3ae8cb8f17d63e48e87ebd0bf upstream. The WaDisableLSQCROPERFforOCL workaround has the side effect of disabling an L3SQ optimization that has huge performance implications and is unlikely to be necessary for the correct functioning of usual graphic workloads. Userspace is free to re-enable the workaround on demand, and is generally in a better position to determine whether the workaround is necessary than the DRM is (e.g. only during the execution of compute kernels that rely on both L3 fences and HDC R/W requests). The same workaround seems to apply to BDW (at least to production stepping G1) and SKL as well (the internal workaround database claims that it does for all steppings, while the BSpec workaround table only mentions pre-production steppings), but the DRM doesn't do anything beyond whitelisting the L3SQCREG4 register so userspace can enable it when it sees fit. Do the same on KBL platforms. Improves performance of the GFXBench4 gl_manhattan31 benchmark by 60%, and gl_4 (AKA car chase) by 14% on a KBL GT2 running Mesa master -- This is followed by a regression of 35% and 10% respectively for the same benchmarks and platform caused by my recent patch series switching userspace to use the dataport constant cache instead of the sampler to implement uniform pull constant loads, which caused us to hit more heavily the L3 cache (and on platforms other than KBL had the opposite effect of improving performance of the same two benchmarks). The overall effect on KBL of this change combined with the recent userspace change is respectively 4.6% and 2.6%. SynMark2 OglShMapPcf was affected by the constant cache changes (though it improved as it did on other platforms rather than regressing), but is not significantly affected by this patch (with statistical significance of 5% and sample size 20). v2: Drop some more code to avoid unused variable warning. Fixes: 738fa1b3123f ("drm/i915/kbl: Add WaDisableLSQCROPERFforOCL") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99256 Signed-off-by: Francisco Jerez Cc: Matthew Auld Cc: Eero Tamminen Cc: Jani Nikula Cc: Mika Kuoppala Cc: beignet@lists.freedesktop.org Reviewed-by: Mika Kuoppala [Removed double Fixes tag] Signed-off-by: Mika Kuoppala Link: http://patchwork.freedesktop.org/patch/msgid/1484217894-20505-1-git-send-email-mika.kuoppala@intel.com (cherry picked from commit 8726f2faa371514fba2f594d799db95203dfeee0) Signed-off-by: Jani Nikula [ Francisco Jerez: Rebase on v4.9 branch. ] Signed-off-by: Francisco Jerez Signed-off-by: Greg Kroah-Hartman

drm/i915: Reset the breadcrumbs IRQ more carefully

2016-10-10T13:06:44+00:00

Along with the interrupt, we want to restore the fake-irq and wait-timeout detection. If we use the breadcrumbs interface to setup the interrupt as it wants, the auxiliary timers will also be restored. v2: Cancel both timers as well, sanitize the IMR. Fixes: 821ed7df6e2a ("drm/i915: Update reset path to fix incomplete requests") Signed-off-by: Chris Wilson Cc: Mika Kuoppala Reviewed-by: Mika Kuoppala Link: http://patchwork.freedesktop.org/patch/msgid/20161007065327.24515-3-chris@chris-wilson.co.uk (cherry picked from commit ad07dfcddf1394e6fed094e7fb426b4242a6814e) Signed-off-by: Jani Nikula

drm/i915: Update reset path to fix incomplete requests

2016-09-09T13:23:05+00:00

Update reset path in preparation for engine reset which requires identification of incomplete requests and associated context and fixing their state so that engine can resume correctly after reset. The request that caused the hang will be skipped and head is reset to the start of breadcrumb. This allows us to resume from where we left-off. Since this request didn't complete normally we also need to cleanup elsp queue manually. This is vital if we employ nonblocking request submission where we may have a web of dependencies upon the hung request and so advancing the seqno manually is no longer trivial. ABI: gem_reset_stats / DRM_IOCTL_I915_GET_RESET_STATS We change the way we count pending batches. Only the active context involved in the reset is marked as either innocent or guilty, and not mark the entire world as pending. By inspection this only affects igt/gem_reset_stats (which assumes implementation details) and not piglit. ARB_robustness gives this guide on how we expect the user of this interface to behave: * Provide a mechanism for an OpenGL application to learn about graphics resets that affect the context. When a graphics reset occurs, the OpenGL context becomes unusable and the application must create a new context to continue operation. Detecting a graphics reset happens through an inexpensive query. And with regards to the actual meaning of the reset values: Certain events can result in a reset of the GL context. Such a reset causes all context state to be lost. Recovery from such events requires recreation of all objects in the affected context. The current status of the graphics reset state is returned by enum GetGraphicsResetStatusARB(); The symbolic constant returned indicates if the GL context has been in a reset state at any point since the last call to GetGraphicsResetStatusARB. NO_ERROR indicates that the GL context has not been in a reset state since the last call. GUILTY_CONTEXT_RESET_ARB indicates that a reset has been detected that is attributable to the current GL context. INNOCENT_CONTEXT_RESET_ARB indicates a reset has been detected that is not attributable to the current GL context. UNKNOWN_CONTEXT_RESET_ARB indicates a detected graphics reset whose cause is unknown. The language here is explicit in that we must mark up the guilty batch, but is loose enough for us to relax the innocent (i.e. pending) accounting as only the active batches are involved with the reset. In the future, we are looking towards single engine resetting (with minimal locking), where it seems inappropriate to mark the entire world as innocent since the reset occurred on a different engine. Reducing the information available means we only have to encounter the pain once, and also reduces the information leaking from one context to another. v2: Legacy ringbuffer submission required a reset following hibernation, or else we restore stale values to the RING_HEAD and walked over stolen garbage. v3: GuC requires replaying the requests after a reset. v4: Restore engine IRQ after reset (so waiters will be woken!) Rearm hangcheck if resetting with a waiter. Cc: Tvrtko Ursulin Cc: Mika Kuoppala Cc: Arun Siluvery Signed-off-by: Chris Wilson Reviewed-by: Mika Kuoppala Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-13-chris@chris-wilson.co.uk

drm/i915: Perform a direct reset of the GPU from the waiter

2016-09-09T13:23:04+00:00

If a waiter is holding the struct_mutex, then the reset worker cannot reset the GPU until the waiter returns. We do not want to return -EAGAIN form i915_wait_request as that breaks delicate operations like i915_vma_unbind() which often cannot be restarted easily, and returning -EIO is just as useless (and has in the past proven dangerous). The remaining WARN_ON(i915_wait_request) serve as a valuable reminder that handling errors from an indefinite wait are tricky. We can keep the current semantic that knowing after a reset is complete, so is the request, by performing the reset ourselves if we hold the mutex. uevent emission is still handled by the reset worker, so it may appear slightly out of order with respect to the actual reset (and concurrent use of the device). Signed-off-by: Chris Wilson Reviewed-by: Mika Kuoppala Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-11-chris@chris-wilson.co.uk

drm/i915: Mark up all locked waiters

2016-09-09T13:23:03+00:00

In the next patch we want to handle reset directly by a locked waiter in order to avoid issues with returning before the reset is handled. To handle the reset, we must first know whether we hold the struct_mutex. If we do not hold the struct_mtuex we can not perform the reset, but we do not block the reset worker either (and so we can just continue to wait for request completion) - otherwise we must relinquish the mutex. Signed-off-by: Chris Wilson Reviewed-by: Mika Kuoppala Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-10-chris@chris-wilson.co.uk

drm/i915: Expand bool interruptible to pass flags to i915_wait_request()

2016-09-09T13:23:03+00:00

We need finer control over wakeup behaviour during i915_wait_request(), so expand the current bool interruptible to a bitmask. Signed-off-by: Chris Wilson Reviewed-by: Joonas Lahtinen Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-9-chris@chris-wilson.co.uk

drm/i915: Make HWS_NEEDS_PHYSICAL the exception

2016-09-07T23:07:09+00:00

Make the .hws_needs_physical the exception by switching the flag on earlier platforms since they are fewer to support. Remove the flag on later GPUs hardware since they all use GTT hws by default. Switch the logic as well in the driver to reflect this change Signed-off-by: Carlos Santa Reviewed-by: Rodrigo Vivi Signed-off-by: Rodrigo Vivi

drm/i915: sseu: Use sseu_dev_info in device info

2016-09-02T15:16:31+00:00

Move all slice/subslice/eu related properties to the sseu_dev_info struct. No functional change. v2: - s/info/sseu/ based on the new struct name. (Ben) Reviewed-by: Robert Bragg (v1) Reviewed-by: Ben Widawsky (v1) Tested-by: Ben Widawsky (v1) Signed-off-by: Imre Deak

drm/i915: Allocate rings from stolen

2016-08-18T21:36:49+00:00

If we have stolen available, make use of it for ringbuffer allocation. Previously this was restricted to !llc platforms, as writing to stolen requires a GGTT mapping - but now that we have partial mappable support, the mappable aperture isn't quite so precious so we can use it more freely and ringbuffers are a good user for the otherwise wasted stolen. Signed-off-by: Chris Wilson Reviewed-by: Joonas Lahtinen Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-18-chris@chris-wilson.co.uk

drm/i915: Allow ringbuffers to be bound anywhere

2016-08-18T21:36:48+00:00

Now that we have WC vmapping available, we can bind our rings anywhere in the GGTT and do not need to restrict them to the mappable region. Except for stolen objects, for which direct access is verbatim and we must use the mappable aperture. Signed-off-by: Chris Wilson Reviewed-by: Joonas Lahtinen Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-17-chris@chris-wilson.co.uk