summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/i915/intel_ringbuffer.c
AgeCommit message (Collapse)AuthorFilesLines
2016-08-11drm/i915: fix WaInsertDummyPushConstPsMatthew Auld1-4/+4
As pointed out by Chris Harris, we are using the wrong WA name, it should in fact be WaToEnableHwFixForPushConstHWBug, also it should be applied from C0 onwards for both BXT and KBL. Fixes: 7b9005cd45f3 ("drm/i915: Add WaInsertDummyPushConstP for bxt and kbl") Cc: Chris Harris <chris.harris@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reported-by: Chris Harris <chris.harris@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Arun Siluvery <arun.siluvery@linux.intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470127013-29653-1-git-send-email-matthew.auld@intel.com (cherry picked from commit 575e3ccbce4582395d57612b289178bad4af3be8) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2016-07-25drm/i915/gen9: Add WaInPlaceDecompressionHangMika Kuoppala1-0/+14
Add this workaround to prevent hang when in place compression is used. References: HSD#2135774 Cc: stable@vger.kernel.org Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Arun Siluvery <arun.siluvery@linux.intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (cherry picked from commit 4ba9c1f7c7b8ca8c1d77f65d408e589dc87b9a2d) Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2016-07-14drm/i915: Unbreak interrupts on pre-gen6Ville Syrjälä1-1/+2
Prior to gen6 we didn't have per-ring IMR registers, which means that since commit 61ff75ac20ff ("drm/i915: Simplify enabling user-interrupts with L3-remapping") we're now masking off all interrupts when init_render_ring() gets called. That's rather rude. Let's limit the ring IMR frobbing to machines that actually have the per-ring IMR registers. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Fixes: 61ff75ac20ff ("drm/i915: Simplify enabling user-interrupts with L3-remapping") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1468340687-3596-1-git-send-email-ville.syrjala@linux.intel.com Reviewd-by: Chris Wilson <chris@chris-wilson.co.uk> (cherry picked from commit 035ea405c91e2dc89325a79129cf9af2b9c2ae8e) Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2016-07-05drm/i915: Convert dev_priv->dev backpointers to dev_priv->drmChris Wilson1-9/+9
Since drm_i915_private is now a subclass of drm_device we do not need to chase the drm_i915_private->dev backpointer and can instead simply access drm_i915_private->drm directly. text data bss dec hex filename 1068757 4565 416 1073738 10624a drivers/gpu/drm/i915/i915.ko 1066949 4565 416 1071930 105b3a drivers/gpu/drm/i915/i915.ko Created by the coccinelle script: @@ struct drm_i915_private *d; identifier i; @@ ( - d->dev->i + d->drm.i | - d->dev + &d->drm ) and for good measure the dev_priv->dev backpointer was removed entirely. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467711623-2905-4-git-send-email-chris@chris-wilson.co.uk
2016-07-04drm/i915: Mass convert dev->dev_private to to_i915(dev)Chris Wilson1-5/+5
Since we now subclass struct drm_device, we can save pointer dances by noting the equivalence of struct drm_device and struct drm_i915_private, i.e. by using to_i915(). text data bss dec hex filename 1073824 4562 416 1078802 107612 drivers/gpu/drm/i915/i915.ko 1068976 4562 416 1073954 106322 drivers/gpu/drm/i915/i915.ko Created by the coccinelle script: @@ expression E; identifier p; @@ - struct drm_i915_private *p = E->dev_private; + struct drm_i915_private *p = to_i915(E); Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Dave Gordon <david.s.gordon@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467628477-25379-1-git-send-email-chris@chris-wilson.co.uk
2016-07-04drm/i915: Remove stop-rings debugfs interfaceChris Wilson1-8/+0
Now that we have (near) universal GPU recovery code, we can inject a real hang from userspace and not need any fakery. Not only does this mean that the testing is far more realistic, but we can simplify the kernel in the process. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Arun Siluvery <arun.siluvery@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467616119-4093-7-git-send-email-chris@chris-wilson.co.uk
2016-07-01drm/i915: Simplify enabling user-interrupts with L3-remappingChris Wilson1-23/+11
Borrow the idea from intel_lrc.c to precompute the mask of interrupts we wish to always enable to avoid having lots of conditionals inside the interrupt enabling. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-19-git-send-email-chris@chris-wilson.co.uk
2016-07-01drm/i915: Move the get/put irq locking into the callerChris Wilson1-167/+70
With only a single callsite for intel_engine_cs->irq_get and ->irq_put, we can reduce the code size by moving the common preamble into the caller, and we can also eliminate the reference counting. For completeness, as we are no longer doing reference counting on irq, rename the get/put vfunctions to enable/disable respectively and are able to review the use of posting reads. We only require the serialisation with hardware when enabling the interrupt (i.e. so we cannot miss an interrupt by going to sleep before the hardware truly enables it). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-18-git-send-email-chris@chris-wilson.co.uk
2016-07-01drm/i915: Add a delay between interrupt and inspecting the final seqno (ilk)Chris Wilson1-64/+17
On Ironlake, there is no command nor register to ensure that the write from a MI_STORE command is completed (and coherent on the CPU) before the command parser continues. This means that the ordering between the seqno write and the subsequent user interrupt is undefined (like gen6+). So to ensure that the seqno write is completed after the final user interrupt we need to delay the read sufficiently to allow the write to complete. This delay is undefined by the bspec, and empirically requires 75us even though a register read combined with a clflush is less than 500ns. Hence, the delay is due to an on-chip buffer rather than the latency of the write to memory. Note that the render ring controls this by filling the PIPE_CONTROL fifo with stalling commands that force the earliest pipe-control with the seqno to be completed before the command parser continues. Given that we need a barrier operation for BSD, we may as well forgo the extra per-batch latency by using a common per-interrupt barrier. Studying the impact of adding the usleep shows that in both sequences of and individual synchronous no-op batches is negligible for the media engine (where the write now is unordered with the interrupt). Converting the render engine over from the current glutton of pie-controls over to the per-interrupt delays speeds up both the sequential and individual synchronous no-ops by 20% and 60%, respectively. This speed up holds even when looking at the throughput of small copies (4KiB->4MiB), both serial and synchronous, by about 20%. This is because despite adding a significant delay to the interrupt, in all likelihood we will see the seqno write without having to apply the barrier (only in the rare corner cases where the write is delayed on the last required is the delay necessary). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94307 Testcase: igt/gem_sync #ilk Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-12-git-send-email-chris@chris-wilson.co.uk
2016-07-01drm/i915: Refactor scratch object allocation for gen2 w/a bufferChris Wilson1-24/+8
The gen2 w/a buffer is stuffed into the same slot as the gen5+ scratch buffer. If we pass in the size we want to allocate for the scratch buffer, both callers can use the same routine. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-11-git-send-email-chris@chris-wilson.co.uk
2016-07-01drm/i915: Allocate scratch page from stolenChris Wilson1-1/+3
With the last direct CPU access to the scratch page removed, we can now allocate it from our small amount of reserved system pages (stolen memory). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-10-git-send-email-chris@chris-wilson.co.uk
2016-07-01drm/i915: Stop mapping the scratch page into CPU spaceChris Wilson1-29/+11
After the elimination of using the scratch page for Ironlake's breadcrumb, we no longer need to kmap the object. We therefore can move it into the high unmappable space and do not need to force the object to be coherent (i.e. snooped on !llc platforms). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-9-git-send-email-chris@chris-wilson.co.uk
2016-07-01drm/i915: Use HWS for seqno tracking everywhereChris Wilson1-48/+17
By using the same address for storing the HWS on every platform, we can remove the platform specific vfuncs and reduce the get-seqno routine to a single read of a cached memory location. v2: Fix semaphore_passed() to look at the signaling engine (not the waiter's) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-8-git-send-email-chris@chris-wilson.co.uk
2016-07-01drm/i915: Slaughter the thundering i915_wait_request herdChris Wilson1-1/+11
One particularly stressful scenario consists of many independent tasks all competing for GPU time and waiting upon the results (e.g. realtime transcoding of many, many streams). One bottleneck in particular is that each client waits on its own results, but every client is woken up after every batchbuffer - hence the thunder of hooves as then every client must do its heavyweight dance to read a coherent seqno to see if it is the lucky one. Ideally, we only want one client to wake up after the interrupt and check its request for completion. Since the requests must retire in order, we can select the first client on the oldest request to be woken. Once that client has completed his wait, we can then wake up the next client and so on. However, all clients then incur latency as every process in the chain may be delayed for scheduling - this may also then cause some priority inversion. To reduce the latency, when a client is added or removed from the list, we scan the tree for completed seqno and wake up all the completed waiters in parallel. Using igt/benchmarks/gem_latency, we can demonstrate this effect. The benchmark measures the number of GPU cycles between completion of a batch and the client waking up from a call to wait-ioctl. With many concurrent waiters, with each on a different request, we observe that the wakeup latency before the patch scales nearly linearly with the number of waiters (before external factors kick in making the scaling much worse). After applying the patch, we can see that only the single waiter for the request is being woken up, providing a constant wakeup latency for every operation. However, the situation is not quite as rosy for many waiters on the same request, though to the best of my knowledge this is much less likely in practice. Here, we can observe that the concurrent waiters incur extra latency from being woken up by the solitary bottom-half, rather than directly by the interrupt. This appears to be scheduler induced (having discounted adverse effects from having a rbtree walk/erase in the wakeup path), each additional wake_up_process() costs approximately 1us on big core. Another effect of performing the secondary wakeups from the first bottom-half is the incurred delay this imposes on high priority threads - rather than immediately returning to userspace and leaving the interrupt handler to wake the others. To offset the delay incurred with additional waiters on a request, we could use a hybrid scheme that did a quick read in the interrupt handler and dequeued all the completed waiters (incurring the overhead in the interrupt handler, not the best plan either as we then incur GPU submission latency) but we would still have to wake up the bottom-half every time to do the heavyweight slow read. Or we could only kick the waiters on the seqno with the same priority as the current task (i.e. in the realtime waiter scenario, only it is woken up immediately by the interrupt and simply queues the next waiter before returning to userspace, minimising its delay at the expense of the chain, and also reducing contention on its scheduler runqueue). This is effective at avoid long pauses in the interrupt handler and at avoiding the extra latency in realtime/high-priority waiters. v2: Convert from a kworker per engine into a dedicated kthread for the bottom-half. v3: Rename request members and tweak comments. v4: Use a per-engine spinlock in the breadcrumbs bottom-half. v5: Fix race in locklessly checking waiter status and kicking the task on adding a new waiter. v6: Fix deciding when to force the timer to hide missing interrupts. v7: Move the bottom-half from the kthread to the first client process. v8: Reword a few comments v9: Break the busy loop when the interrupt is unmasked or has fired. v10: Comments, unnecessary churn, better debugging from Tvrtko v11: Wake all completed waiters on removing the current bottom-half to reduce the latency of waking up a herd of clients all waiting on the same request. v12: Rearrange missed-interrupt fault injection so that it works with igt/drv_missed_irq_hang v13: Rename intel_breadcrumb and friends to intel_wait in preparation for signal handling. v14: RCU commentary, assert_spin_locked v15: Hide BUG_ON behind the compiler; report on gem_latency findings. v16: Sort seqno-groups by priority so that first-waiter has the highest task priority (and so avoid priority inversion). v17: Add waiters to post-mortem GPU hang state. v18: Return early for a completed wait after acquiring the spinlock. Avoids adding ourselves to the tree if the is already complete, and skips the awkward question of why we don't do completion wakeups for waits earlier than or equal to ourselves. v19: Prepare for init_breadcrumbs to fail. Later patches may want to allocate during init, so be prepared to propagate back the error code. Testcase: igt/gem_concurrent_blit Testcase: igt/benchmarks/gem_latency Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com> Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Dave Gordon <david.s.gordon@intel.com> Cc: "Goel, Akash" <akash.goel@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> #v18 Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-6-git-send-email-chris@chris-wilson.co.uk
2016-07-01drm/i915/ringbuffer: Move all default irq vfuncs init to a separate funcChris Wilson1-19/+24
Just plonk all the default irq vfuncs together in one function to keep the initialisers of reasonable size. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467361093-20209-2-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2016-07-01drm/i915/ringbuffer: Move all generic engine->dispatch_batchbuffer togetherChris Wilson1-13/+14
Consolidate the block of default vfuncs for dispatching the batchbuffer. Just a minor tweak on top of Tvrtko's great job of tidying up the vfunc initialisation. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467361093-20209-1-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2016-06-30drm/i915: Trim some if-else bracesTvrtko Ursulin1-9/+6
Just a bit of cleanup after the previous refactoring. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1467212972-861-1-git-send-email-tvrtko.ursulin@linux.intel.com
2016-06-30drm/i915: Consolidate legacy semaphore initializationTvrtko Ursulin1-62/+48
Replace per-engine initialization with a common half-programatic, half-data driven code for ease of maintenance and compactness. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-06-30drm/i915: Compact gen8_ring_syncTvrtko Ursulin1-4/+3
Store the semaphore offset in a temporary variable to avoid having to get the VMA offset twice. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-06-30drm/i915: Compact Gen8 semaphore initializationTvrtko Ursulin1-2/+14
Replace the macro initializer with a programatic loop which results in smaller code and hopefully just as clear. v2: Rebase. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-06-30drm/i915: Move semaphore object creation into intel_ring_init_semaphoresTvrtko Ursulin1-20/+25
The object needs to be created before semaphores can be initialized on any ring and it makes sense to pull it out to this semaphore dedicated helper. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-06-30drm/i915: Consolidate semaphore vfuncs initTvrtko Ursulin1-30/+18
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2016-06-30drm/i915: Consolidate dispatch_execbuffer vfuncTvrtko Ursulin1-19/+8
v2: Put dispatch_execbuffer before add_request. (Chris Wilson) v3: Fix add_request and irq_seqno_barrier for gen8+. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-06-30drm/i915: Consolidate init_hw vfuncTvrtko Ursulin1-4/+1
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-06-30drm/i915: Consolidate get/set_seqnoTvrtko Ursulin1-16/+2
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-06-30drm/i915: Consolidate get and put irq vfuncsTvrtko Ursulin1-29/+17
v2: Consistent INTEL_GEN vs IS_GEN usage. (Chris Wilson) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-06-30drm/i915: Consolidate seqno_barrier vfuncTvrtko Ursulin1-7/+4
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-06-30drm/i915: Consolidate add_request vfuncTvrtko Ursulin1-7/+5
All engines apart from render select this based on Gen. Move it to the common helper as well. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-06-30drm/i915: Consolidate write_tail vfunc initializerTvrtko Ursulin1-8/+19
Introduce a function which initializes vfuncs mostly common across engines and move write_tail initialization in it since only one engine overrides the default. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-06-30drm/i915: Perform Sandybridge BSD tail write under the forcewakeChris Wilson1-12/+16
Since we have a sequence of register reads and writes, we can reduce the latency of starting the BSD ring by performing all the mmio operations under the same forcewake wakeref. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467297225-21379-62-git-send-email-chris@chris-wilson.co.uk
2016-06-30drm/i915: Convert wait_for(I915_READ(reg)) to intel_wait_for_register()Chris Wilson1-3/+5
By using the out-of-line intel_wait_for_register() not only do we can efficiency from using the hybrid wait_for() contained within, but we avoid code bloat from the numerous inlined loops, in total (all patches): text data bss dec hex filename 1078551 4557 416 1083524 108884 drivers/gpu/drm/i915/i915.ko 1070775 4557 416 1075748 106a24 drivers/gpu/drm/i915/i915.ko Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467297225-21379-48-git-send-email-chris@chris-wilson.co.uk
2016-06-30drm/i915: Convert wait_for(I915_READ(reg)) to intel_wait_for_register()Chris Wilson1-1/+5
By using the out-of-line intel_wait_for_register() not only do we can efficiency from using the hybrid wait_for() contained within, but we avoid code bloat from the numerous inlined loops, in total (all patches): text data bss dec hex filename 1078551 4557 416 1083524 108884 drivers/gpu/drm/i915/i915.ko 1070775 4557 416 1075748 106a24 drivers/gpu/drm/i915/i915.ko Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467297225-21379-47-git-send-email-chris@chris-wilson.co.uk
2016-06-30drm/i915: Convert wait_for(I915_READ(reg)) to intel_wait_for_register()Chris Wilson1-2/+3
By using the out-of-line intel_wait_for_register() not only do we can efficiency from using the hybrid wait_for() contained within, but we avoid code bloat from the numerous inlined loops, in total (all patches): text data bss dec hex filename 1078551 4557 416 1083524 108884 drivers/gpu/drm/i915/i915.ko 1070775 4557 416 1075748 106a24 drivers/gpu/drm/i915/i915.ko Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467297225-21379-46-git-send-email-chris@chris-wilson.co.uk
2016-06-24drm/i915: Treat kernel context as initialisedChris Wilson1-0/+10
The kernel context exists simply as a placeholder and should never be executed with a render context. It does not need the golden render state, as that will always be applied to a user context. By skipping the initialisation we can avoid issues in attempting to program the golden render context when trying to make the hardware idle. v2: Rebase Testcase: igt/drm_module_reload_basic #byt Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95634 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1466776558-21516-3-git-send-email-chris@chris-wilson.co.uk
2016-06-24drm/i915: Move legacy kernel context pinning to intel_ringbuffer.cChris Wilson1-0/+55
This is so that we have symmetry with intel_lrc.c and avoid a source of if (i915.enable_execlists) layering violation within i915_gem_context.c - that is we move the specific handling of the dev_priv->kernel_context for legacy submission into the legacy submission code. This depends upon the init/fini ordering between contexts and engines already defined by intel_lrc.c, and also exporting the context alignment required for pinning the legacy context. v2: Separate out pin/unpin context funcs for greater symmetry with intel_lrc. One more step towards unifying behaviour between the two classes of engines and towards fixing another bug in i915_switch_context vs requests. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1466776558-21516-2-git-send-email-chris@chris-wilson.co.uk
2016-06-14drm/i915/bxt: Add WaDisablePooledEuLoadBalancingFixarun.siluvery@linux.intel.com1-0/+6
This is a WA affecting pooled eu which is a bxt specific feature. Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Winiarski, Michal <michal.winiarski@intel.com> Cc: Zou, Nanhai <nanhai.zou@intel.com> Cc: Yang, Rong R <rong.r.yang@intel.com> Cc: Jeff McGee <jeff.mcgee@intel.com> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2016-06-13drm/i915/gen9: implement WaConextSwitchWithConcurrentTLBInvalidateTim Gore1-0/+3
This patch enables a workaround for a mid thread preemption issue where a hardware timing problem can prevent the context restore from happening, leading to a hang. v2: move to gen9_init_workarounds (Arun) v3: move to start of gen9_init_workarounds (Arun) Signed-off-by: Tim Gore <tim.gore@intel.com> Reviewed-by: Arun Siluvery <arun.siluvery@linux.intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1465816501-25557-1-git-send-email-tim.gore@intel.com
2016-06-08drm/i915/skl: Extend WaDisableChickenBitTSGBarrierAckForFFSliceCSMika Kuoppala1-1/+1
There is ambiguity in the documentation between D0 and E0. Extend this workaround to E0. References: BSID#779 Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1465309159-30531-23-git-send-email-mika.kuoppala@intel.com
2016-06-08drm/i915/kbl: Add WaDisableSbeCacheDispatchPortSharingMika Kuoppala1-0/+5
This is needed for all kbl revision. v2: Don't add revid checks to generic gen9 init (Arun) References: HSD#2135593 Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1465309159-30531-21-git-send-email-mika.kuoppala@intel.com
2016-06-08drm/i915/kbl: Add WaDisableGafsUnitClkGatingMika Kuoppala1-0/+3
We need to disable clock gating in this unit to work around hardware issue causing possible corruption/hang. v2: name the bit (Ville) v3: leave the fix enabled for 2227050 and set correct bit (Matthew) v4: Split out the skl part in separate commit for easier backport References: HSD#2227156, HSD#2227050 Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Matthew Auld <matthew.william.auld@gmail.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1465309159-30531-20-git-send-email-mika.kuoppala@intel.com
2016-06-08drm/i915: Add WaInsertDummyPushConstP for bxt and kblMika Kuoppala1-0/+10
Add this workaround for both bxt and kbl up to until rev B0. References: HSD#2136703 Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1465309159-30531-16-git-send-email-mika.kuoppala@intel.com
2016-06-08drm/i915/kbl: Add WaDisableDynamicCreditSharingMika Kuoppala1-0/+5
Bspec states that we need to turn off dynamic credit sharing on kbl revid a0 and b0. This happens by writing bit 28 on 0x4ab8. References: HSD#2225601, HSD#2226938, HSD#2225763 Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1465309159-30531-15-git-send-email-mika.kuoppala@intel.com
2016-06-08drm/i915/kbl: Add WaDisableLSQCROPERFforOCLMika Kuoppala1-0/+13
Extend the scope of this workaround, already used in skl, to also take effect in kbl. v2: Fix KBL_REVID_E0 (Matthew) References: HSD#2132677 Cc: Matthew Auld <matthew.william.auld@gmail.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1465309159-30531-12-git-send-email-mika.kuoppala@intel.com
2016-06-08drm/i915/kbl: Add WaDisableFenceDestinationToSLM for A0Mika Kuoppala1-0/+5
Add this workaround for kbl revid A0 only. v2: rebase v3: carve out a non related workaround (Chris) References: HSD#1911714 Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1465309159-30531-9-git-send-email-mika.kuoppala@intel.com
2016-06-08drm/i915/kbl: Add WaEnableGapsTsvCreditFixMika Kuoppala1-0/+5
We need this crucial workaround from skl also to all kbl revisions. Lack of it was causing system hangs on skl enabling so this is a must have. v2: Don't add revid checks to gen9 init workarounds (Arun) References: HSD#2126660 Cc: Arun Siluvery <arun.siluvery@linux.intel.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1465309159-30531-8-git-send-email-mika.kuoppala@intel.com
2016-06-08drm/i915: Mimic skl with WaForceEnableNonCoherentMika Kuoppala1-16/+21
Past evidence with system hangs and hsds tie WaForceEnableNonCoherent and WaDisableHDCInvalidation to WaForceContextSaveRestoreNonCoherent. Documentation states that WaForceContextSaveRestoreNonCoherent would not be needed on skl past E0 but evidence proved otherwise. See commit <510650e8b2ab> ("drm/i915/skl: Fix spurious gpu hang with gt3/gt4 revs"). In this scope consider kbl to be skl with a bigger revision than E0 so play it safe and bind these two workarounds to the WaForceContextSaveRestoreNonCoherent, and apply to all gen9. v2: fix comment (Matthew) References: HSD#2134449, HSD#2131413 Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1465309159-30531-7-git-send-email-mika.kuoppala@intel.com
2016-06-08drm/i915/gen9: Always apply WaForceContextSaveRestoreNonCoherentMika Kuoppala1-7/+4
The revision id range for this workaround has changed. So apply it to all revids on all gen9. References: HSD#2134449 Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1465309159-30531-6-git-send-email-mika.kuoppala@intel.com
2016-06-08drm/i915/kbl: Init gen9 workaroundsMika Kuoppala1-16/+32
Kabylake is part of gen9 family so init the generic gen9 workarounds for it. v2: rebase Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1465309159-30531-3-git-send-email-mika.kuoppala@intel.com
2016-06-08drm/i915/skl: Add WaDisableGafsUnitClkGatingMika Kuoppala1-0/+3
We need to disable clock gating in this unit to work around hardware issue causing possible corruption/hang. v2: name the bit (Ville) v3: leave the fix enabled for 2227050 and set correct bit (Matthew) References: HSD#2227156, HSD#2227050 Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Matthew Auld <matthew.william.auld@gmail.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1465309159-30531-2-git-send-email-mika.kuoppala@intel.com
2016-06-06drm/i915/gen9: Add WaVFEStateAfterPipeControlwithMediaStateCleararun.siluvery@linux.intel.com1-0/+5
Kernel only need to add a register to HW whitelist, required for a preemption related issue. Reference: HSD#2131039 Reviewed-by: Jeff McGee <jeff.mcgee@intel.com> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1465203169-16591-1-git-send-email-arun.siluvery@linux.intel.com