kernel/linux.git/drivers/gpu/drm/i915/intel_ringbuffer.c, branch linux-4.20.y

drm/i915: Flush GPU relocs harder for gen3

2018-12-12T10:27:44+00:00

Adding an extra MI_STORE_DWORD_IMM to the gpu relocation path for gen3 was good, but still not good enough. To survive 24+ hours under test we needed to perform not one, not two but three extra store-dw. Doing so for each GPU relocation was a little unsightly and since we need to worry about userspace hitting the same issues, we should apply the dummy store-dw into the EMIT_FLUSH. Fixes: 7dd4f6729f92 ("drm/i915: Async GPU relocation processing") References: 7fa28e146994 ("drm/i915: Write GPU relocs harder with gen3") Testcase: igt/gem_tiled_fence_blits # blb/pnv Signed-off-by: Chris Wilson Cc: Joonas Lahtinen Reviewed-by: Joonas Lahtinen Link: https://patchwork.freedesktop.org/patch/msgid/20181207134037.11848-1-chris@chris-wilson.co.uk (cherry picked from commit a889580c087a9cf91fddb3832ece284174214183) Signed-off-by: Joonas Lahtinen

drm/i915: Allocate a common scratch page

2018-12-12T10:27:44+00:00

Currently we allocate a scratch page for each engine, but since we only ever write into it for post-sync operations, it is not exposed to userspace nor do we care for coherency. As we then do not care about its contents, we can use one page for all, reducing our allocations and avoid complications by not assuming per-engine isolation. For later use, it simplifies engine initialisation (by removing the allocation that required struct_mutex!) and means that we can always rely on there being a scratch page. v2: Check that we allocated a large enough scratch for I830 w/a Fixes: 06e562e7f515 ("drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5") # v4.18.20 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108850 Signed-off-by: Chris Wilson Cc: Tvrtko Ursulin Cc: Mika Kuoppala Reviewed-by: Mika Kuoppala Link: https://patchwork.freedesktop.org/patch/msgid/20181204141522.13640-1-chris@chris-wilson.co.uk Cc: Joonas Lahtinen Cc: # v4.18.20+ (cherry picked from commit 5179749925933575a67f9d8f16d0cc204f98a29f) [Joonas: Use new function in gen9_init_indirectctx_bb too] Signed-off-by: Joonas Lahtinen

drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5

2018-11-12T15:07:12+00:00

Exercising the gpu reloc path strenuously revealed an issue where the updated relocations (from MI_STORE_DWORD_IMM) were not being observed upon execution. After some experiments with adding pipecontrols (a lot of pipecontrols (32) as gen4/5 do not have a bit to wait on earlier pipe controls or even the current on), it was discovered that we merely needed to delay the EMIT_INVALIDATE by several flushes. It is important to note that it is the EMIT_INVALIDATE as opposed to the EMIT_FLUSH that needs the delay as opposed to what one might first expect -- that the delay is required for the TLB invalidation to take effect (one presumes to purge any CS buffers) as opposed to a delay after flushing to ensure the writes have landed before triggering invalidation. Testcase: igt/gem_tiled_fence_blits Signed-off-by: Chris Wilson Cc: stable@vger.kernel.org Reviewed-by: Ville Syrjälä Link: https://patchwork.freedesktop.org/patch/msgid/20181105094305.5767-1-chris@chris-wilson.co.uk (cherry picked from commit 55f99bf2a9c331838c981694bc872cd1ec4070b2) Signed-off-by: Joonas Lahtinen

drm/i915/ringbuffer: Reload PDs harder on byt/bcs

2018-09-12T10:02:08+00:00

Baytrail takes a little more convincing that it needs to actually reload its Page Directoy (ppGTT) before the context switch, so repeat it until it gets the message. Once again the arbitrary values here are empirically derived. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107861 Testcase: igt/gem_exec_parallel/fds Signed-off-by: Chris Wilson Reviewed-by: Joonas Lahtinen Link: https://patchwork.freedesktop.org/patch/msgid/20180910130808.10809-1-chris@chris-wilson.co.uk

drm/i915/ringbuffer: Move double invalidate to after pd flush

2018-09-04T13:28:51+00:00

Continuing the fun of trying to find exactly the delay that is sufficient to ensure that the page directory is fully loaded between context switches, move the extra flush added in commit 70b73f9ac113 ("drm/i915/ringbuffer: Delay after invalidating gen6+ xcs") to just after we flush the pd. Entirely based on the empirical data of running failing tests in a loop until we survive a day (before the mtbf is 10-30 minutes). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107769 References: 70b73f9ac113 ("drm/i915/ringbuffer: Delay after invalidating gen6+ xcs") Signed-off-by: Chris Wilson Acked-by: Mika Kuoppala Link: https://patchwork.freedesktop.org/patch/msgid/20180904063802.13880-1-chris@chris-wilson.co.uk

drm/i915: Use a cached mapping for the physical HWS

2018-09-03T16:55:59+00:00

Older gen use a physical address for the hardware status page, for which we use cache-coherent writes. As the writes are into the cpu cache, we use a normal WB mapped page to read the HWS, used for our seqno tracking. Anecdotally, I observed lost breadcrumbs writes into the HWS on i965gm, which so far have not reoccurred with this patch. How reliable that evidence is remains to be seen. v2: Explicitly pass the expected physical address to the hw v3: Also remember the wild writes we once had for HWS above 4G. Signed-off-by: Chris Wilson Cc: Daniel Vetter Cc: Joonas Lahtinen Reviewed-by: Daniel Vetter Link: https://patchwork.freedesktop.org/patch/msgid/20180903152304.31589-2-chris@chris-wilson.co.uk

drm/i915/ringbuffer: Delay after invalidating gen6+ xcs

2018-08-30T17:26:48+00:00

During stress testing of full-ppgtt (on Baytrail at least), we found that the invalidation around a context/mm switch was insufficient (writes would go astray). Adding a second MI_FLUSH_DW barrier prevents this, but it is unclear as to whether this is merely a delaying tactic or if it is truly serialising with the TLB invalidation. Either way, it is empirically required. v2: Avoid the loop for readability; Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107715 References: https://bugs.freedesktop.org/show_bug.cgi?id=107759 Signed-off-by: Chris Wilson Cc: Joonas Lahtinen Cc: Mika Kuoppala Cc: Matthew Auld Cc: Tvrtko Ursulin Reviewed-by: Ville Syrjälä Link: https://patchwork.freedesktop.org/patch/msgid/20180830161042.29193-1-chris@chris-wilson.co.uk

drm/i915: Kick waiters on resetting legacy rings

2018-08-14T11:42:29+00:00

This reapplies commit 39f3be162c46 ("drm/i915: Kick waiters on resetting legacy rings") after the improved gem_eio was run across all machines we found that gen3 and early gen4 still lost the immediate interrupt following reset, and the HWSTAM w/a applied to gen6+ is inadequate. Unlike the later gen, on gen3/4 the principle (and only tests to fail so far) are the wait vs reset test cases, whereas the reset stress case works fine (which was the predominantly failing case for gen6+). That is enough to suggest the underlying issue is sufficiently different to support the difference in HWSTAM efficacy. Testcase: igt/gem_eio/wait-10ms References: 39f3be162c46 ("drm/i915: Kick waiters on resetting legacy rings") References: a69ab52b0358 ("drm/i915: Remove extra waiter kick on legacy resets") Signed-off-by: Chris Wilson Cc: Matthew Auld Cc: Mika Kuoppala Reviewed-by: Mika Kuoppala Link: https://patchwork.freedesktop.org/patch/msgid/20180814104056.27001-1-chris@chris-wilson.co.uk

drm/i915: Remove extra waiter kick on legacy resets

2018-08-08T16:08:08+00:00

Now with a more efficacious workaround for the lost interrupts after reset, we can remove the hack of kicking the waiters after reset. The issue was that the kick only worked for the immediate window after the reset (those seqno that would complete in the time it took for the waiter thread to perform its check) but miss any seqno that lacked an interrupt afterwards. References: 39f3be162c46 ("drm/i915: Kick waiters on resetting legacy rings") Signed-off-by: Chris Wilson Cc: Matthew Auld Cc: Ville Syrjälä Acked-by: Mika Kuoppala Link: https://patchwork.freedesktop.org/patch/msgid/20180808105101.913-3-chris@chris-wilson.co.uk

drm/i915: Unmask user interrupts writes into HWSP on snb/ivb/vlv/hsw

2018-08-08T16:08:07+00:00

An oddity occurs on Sandybridge, Ivybridge and Haswell (and presumably Valleyview) in that for the period following the GPU restart after a reset, there are no GT interrupts received. From Ville's notes, bit 0 in the HWSTAM corresponds to the render interrupt, and if we unmask it we do see immediate resumption of GT interrupt delivery (via the master irq handler) after the reset. v2: Limit the w/a to the render interrupt from rcs Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107500 Fixes: c5498089463b ("drm/i915: Mask everything in ring HWSTAM on gen6+ in ringbuffer mode") References: d420a50c21ef ("drm/i915: Clean up the HWSTAM mess") Testcase: igt/gem_eio/reset-stress Signed-off-by: Chris Wilson Cc: Ville Syrjälä Acked-by: Mika Kuoppala Link: https://patchwork.freedesktop.org/patch/msgid/20180808105101.913-2-chris@chris-wilson.co.uk