summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/i915/i915_gem.h
AgeCommit message (Collapse)AuthorFilesLines
2018-06-29drm/i915/execlists: Direct submission of new requests (avoid tasklet/ksoftirqd)Chris Wilson1-0/+5
Back in commit 27af5eea54d1 ("drm/i915: Move execlists irq handler to a bottom half"), we came to the conclusion that running our CSB processing and ELSP submission from inside the irq handler was a bad idea. A really bad idea as we could impose nearly 1s latency on other users of the system, on average! Deferring our work to a tasklet allowed us to do the processing with irqs enabled, reducing the impact to an average of about 50us. We have since eradicated the use of forcewaked mmio from inside the CSB processing and ELSP submission, bringing the impact down to around 5us (on Kabylake); an order of magnitude better than our measurements 2 years ago on Broadwell and only about 2x worse on average than the gem_syslatency on an unladen system. In this iteration of the tasklet-vs-direct submission debate, we seek a compromise where by we submit new requests immediately to the HW but defer processing the CS interrupt onto a tasklet. We gain the advantage of low-latency and ksoftirqd avoidance when waking up the HW, while avoiding the system-wide starvation of our CS irq-storms. Comparing the impact on the maximum latency observed (that is the time stolen from an RT process) over a 120s interval, repeated several times (using gem_syslatency, similar to RT's cyclictest) while the system is fully laden with i915 nops, we see that direct submission an actually improve the worse case. Maximum latency in microseconds of a third party RT thread (gem_syslatency -t 120 -f 2) x Always using tasklets (a couple of >1000us outliers removed) + Only using tasklets from CS irq, direct submission of requests +------------------------------------------------------------------------+ | + | | + | | + | | + + | | + + + | | + + + + x x x | | +++ + + + x x x x x x | | +++ + ++ + + *x x x x x x | | +++ + ++ + * *x x * x x x | | + +++ + ++ * * +*xxx * x x xx | | * +++ + ++++* *x+**xx+ * x x xxxx x | | **x++++*++**+*x*x****x+ * +x xx xxxx x x | |x* ******+***************++*+***xxxxxx* xx*x xxx + x+| | |__________MA___________| | | |______M__A________| | +------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 118 91 186 124 125.28814 16.279137 + 120 92 187 109 112.00833 13.458617 Difference at 95.0% confidence -13.2798 +/- 3.79219 -10.5994% +/- 3.02677% (Student's t, pooled s = 14.9237) However the mean latency is adversely affected: Mean latency in microseconds of a third party RT thread (gem_syslatency -t 120 -f 1) x Always using tasklets + Only using tasklets from CS irq, direct submission of requests +------------------------------------------------------------------------+ | xxxxxx + ++ | | xxxxxx + ++ | | xxxxxx + +++ ++ | | xxxxxxx +++++ ++ | | xxxxxxx +++++ ++ | | xxxxxxx +++++ +++ | | xxxxxxx + ++++++++++ | | xxxxxxxx ++ ++++++++++ | | xxxxxxxx ++ ++++++++++ | | xxxxxxxxxx +++++++++++++++ | | xxxxxxxxxxx x +++++++++++++++ | |x xxxxxxxxxxxxx x + + ++++++++++++++++++ +| | |__A__| | | |____A___| | +------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 120 3.506 3.727 3.631 3.6321417 0.02773109 + 120 3.834 4.149 4.039 4.0375167 0.041221676 Difference at 95.0% confidence 0.405375 +/- 0.00888913 11.1608% +/- 0.244735% (Student's t, pooled s = 0.03513) However, since the mean latency corresponds to the amount of irqsoff processing we have to do for a CS interrupt, we only need to speed that up to benefit not just system latency but our own throughput. v2: Remember to defer submissions when under reset. v4: Only use direct submission for new requests v5: Be aware that with mixing direct tasklet evaluation and deferred tasklets, we may end up idling before running the deferred tasklet. v6: Remove the redudant likely() from tasklet_is_enabled(), restrict the annotation to reset_in_progress(). v7: Take the full timeline.lock when enabling perf_pmu stats as the tasklet is no longer a valid guard. A consequence is that the stats are now only valid for engines also using the timeline.lock to process state. Testcase: igt/gem_exec_latency/*rthog* References: 27af5eea54d1 ("drm/i915: Move execlists irq handler to a bottom half") Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180628201211.13837-9-chris@chris-wilson.co.uk
2018-05-25drm/i915/execlists: Wait for ELSP submission on restartChris Wilson1-0/+6
After a reset, we will ensure that there is at least one request submitted to HW to ensure that a context is loaded for powersaving. Let's wait for this submission via a tasklet to complete before we drop our forcewake, ensuring the system is ready for rc6 before we let it possibly sleep. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20180522101937.7738-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2018-05-24drm/i915: Look for an active kernel context before switchingChris Wilson1-0/+3
We were not very carefully checking to see if an older request on the engine was an earlier switch-to-kernel-context before deciding to emit a new switch. The end result would be that we could get into a permanent loop of trying to emit a new request to perform the switch simply to flush the existing switch. What we need is a means of tracking the completion of each timeline versus the kernel context, that is to detect if a more recent request has been submitted that would result in a switch away from the kernel context. To realise this, we need only to look in our syncmap on the kernel context and check that we have synchronized against all active rings. v2: Since all ringbuffer clients currently share the same timeline, we do have to use the gem_context to distinguish clients. As a bonus, include all the tracing used to debug the death inside suspend. v3: Test, test, test. Construct a selftest to exercise and assert the expected behaviour that multiple switch-to-contexts do not emit redundant requests. Reported-by: Mika Kuoppala <mika.kuoppala@intel.com> Fixes: a89d1f921c15 ("drm/i915: Split i915_gem_timeline into individual timelines") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180524081135.15278-1-chris@chris-wilson.co.uk
2018-05-16drm/i915: Only sync tasklets once for recursive reset preparationChris Wilson1-0/+7
When setting up reset, we may need to recursively prepare an engine. In which case we should only synchronously flush the tasklets on the outer most call, the inner calls will then be inside an atomic section where the tasklet will never be run (and so the sync will never complete). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180516183355.10553-2-chris@chris-wilson.co.uk
2018-04-26drm/i915: Compile out engine debug for releaseChris Wilson1-0/+6
The majority of the engine state dumping is too voluminous to be useful outside of a controlled setup, though a few do accompany severe errors. Keep the debug dumps next to the errors, but hide the others behind a CI compile flag. This becomes more useful when adding more dumps to latency sensitive paths. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180426103219.22181-1-chris@chris-wilson.co.uk
2018-04-06drm/i915: Split out parking from the idle worker for reuseChris Wilson1-0/+5
We will want to park GEM before disengaging the drive^W^W^W unwedging. Since we already do the work for idling, expose the guts as a new function that we can then reuse. v2: Just skip if already parked; makes it more forgiving to use by future callers. v3: Extract mark_busy, rename it to i915_gem_unpark and place it next to i915_gem_park so that we can evaluate it for symmetry more easily. Calling GEM from inside i915_request looks to be a bit of a layering violation, for the moment I am imaging them as being notify_cb. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> #v1 Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180406155144.27791-1-chris@chris-wilson.co.uk
2018-03-14drm/i915: Show GEM_TRACE when detecting a failed GPU idleChris Wilson1-0/+2
If we timeout waiting for the GPU to idle, something went seriously wrong. We currently dump the engine state, but we can also dump the ftrace buffer showing our last operations (when available). In passing, note that since commit 559e040f1f08 ("drm/i915: Show the GPU state when declaring wedged", we now show the engine state twice, once in detecting the failed idle and then again on declaring wedged. v2: ftrace_dump() takes a parameter specifying whether to dump all cpu buffers or the local cpu's. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180309101114.1138-1-chris@chris-wilson.co.uk
2018-03-01drm/i915/icl: Prepare for more ringsTvrtko Ursulin1-1/+1
Gen11 will add more VCS and VECS rings so prepare the infrastructure to support that. Bspec: 7021 v2: Rebase. v3: Rebase. v4: Rebase. v5: Rebase. v6: - Update for POR changes. (Daniele Ceraolo Spurio) - Add provisional guc engine ids - to be checked and confirmed. v7: - Rebased. - Added the new ring masks. - Added the new HW ids. v8: - Introduce I915_MAX_VCS/VECS to avoid magic numbers (Michal) v9: increase MAX_ENGINE_INSTANCE to 3 Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Oscar Mateo <oscar.mateo@intel.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180228101153.7224-1-mika.kuoppala@linux.intel.com
2018-02-28drm/i915: Repeat the GEM_BUG_ON message in the ftrace logChris Wilson1-1/+4
As the ftrace log is overflowing the pstore capture, we lose the last gasps from dmesg which includes the GEM_BUG_ON function:line and condition that failed. Vital information for tracking down the bug, so append it to the frace log as well. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180227211816.5546-1-chris@chris-wilson.co.uk
2017-11-16drm/i915: Print the condition causing GEM_BUG_ONMika Kuoppala1-1/+5
It is easier to categorize and debug bugs if the failed condition is in plain sight in the actual dmesg output. Make it so. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Marta Lofstedt <marta.lofstedt@intel.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171116083954.3357-1-mika.kuoppala@linux.intel.com
2017-11-10drm/i915: Use trace_printk to provide a death rattle for GEMChris Wilson1-0/+6
Trying to enable printk debugging for GEM is fraught with the issue of spam; interactions with HW are very frequent and often boring. However, one instance where they are not so boring is just before a BUG; here ftrace provides a facility to dump its ringbuffer on an oops. So for CI let's enable trace_printk() to capture the last exchanges with HW as a death rattle. For example, [ 79.234110] ------------[ cut here ]------------ [ 79.234137] kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:907! [ 79.234145] invalid opcode: 0000 [#1] SMP [ 79.234153] Dumping ftrace buffer: [ 79.234158] --------------------------------- ... [ 79.314044] gem_conc-1059 1..s1 79203443us : intel_lrc_irq_handler: bcs0 out[0]: ctx=5.2, seqno=145 [ 79.314089] gem_conc-1059 1..s. 79220800us : intel_lrc_irq_handler: bcs0 csb[1/1]: status=0x00000018:0x00000005 [ 79.314133] gem_conc-1059 1..s. 79220803us : intel_lrc_irq_handler: bcs0 out[0]: ctx=5.1, seqno=145 [ 79.314177] gem_conc-1062 2..s1 79230458us : intel_lrc_irq_handler: bcs0 in[0]: ctx=8.1, seqno=146 [ 79.314220] gem_conc-1062 2..s1 79230515us : intel_lrc_irq_handler: bcs0 in[0]: ctx=8.2, seqno=147 [ 79.314265] gem_conc-1059 1..s1 79230951us : intel_lrc_irq_handler: bcs0 csb[2/3]: status=0x00000012:0x00000008 [ 79.314309] gem_conc-1059 1..s1 79230954us : intel_lrc_irq_handler: bcs0 out[0]: ctx=8.2, seqno=147 [ 79.314353] gem_conc-1059 1..s1 79230954us : intel_lrc_irq_handler: bcs0 csb[3/3]: status=0x00008002:0x00000008 [ 79.314396] gem_conc-1059 1..s1 79230955us : intel_lrc_irq_handler: bcs0 out[0]: ctx=8.1, seqno=147 [ 79.314402] --------------------------------- v2: Tweak the formatting to be more consistent between in/out. v3: do {} while (0) stub macro protection Suggested-by: Michał Winiarski <michal.winiarski@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171109143019.16568-1-chris@chris-wilson.co.uk
2017-05-03drm/i915: Squash repeated awaits on the same fenceChris Wilson1-0/+2
Track the latest fence waited upon on each context, and only add a new asynchronous wait if the new fence is more recent than the recorded fence for that context. This requires us to filter out unordered timelines, which are noted by DMA_FENCE_NO_CONTEXT. However, in the absence of a universal identifier, we have to use our own i915->mm.unordered_timeline token. v2: Throw around the debug crutches v3: Inline the likely case of the pre-allocation cache being full. v4: Drop the pre-allocation support, we can lose the most recent fence in case of allocation failure -- it just means we may emit more awaits than strictly necessary but will not break. v5: Trim allocation size for leaf nodes, they only need an array of u32 not pointers. v6: Create mock_timeline to tidy selftest writing v7: s/intel_timeline_sync_get/intel_timeline_sync_is_later/ (Tvrtko) v8: Prune the stale sync points when we idle. v9: Include a small benchmark in the kselftests v10: Separate the idr implementation into its own compartment. (Tvrkto) v11: Refactor igt_sync kselftests to avoid deep nesting (Tvrkto) v12: __sync_leaf_idx() to assert that p->height is 0 when checking leaves v13: kselftests to investigate struct i915_syncmap itself (Tvrtko) v14: Foray into ascii art graphs v15: Take into account that the random lookup/insert does 2 prng calls, not 1, when benchmarking, and use for_each_set_bit() (Tvrtko) v16: Improved ascii art Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170503093924.5320-4-chris@chris-wilson.co.uk
2017-02-11drm/i915: Rename conditional GEM execution macrosChris Wilson1-6/+6
After a brief discussion, we settled on a naming convention for the conditional GEM debugging data that should be clearer to the casual user: GEM_DEBUG Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170207102319.10910-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-02-06drm/i915: Mark the end of intel_ring_begin() and check in intel_ring_advance()Chris Wilson1-0/+9
It is required that the caller declare the exact number of dwords they wish to write into the ring. This is required for two reasons, we need to allocate sufficient space for the entire command packet and we need to be sure that the contents are completely written to avoid executing stale data. The current interface requires for any bug to be caught in review, the reader has to carefully count the number of intel_ring_emit() between intel_ring_begin() and intel_ring_advance(). If we record the end of the packet of each intel_ring_begin() we can also have CI check for us. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170206170502.30944-1-chris@chris-wilson.co.uk
2016-12-17drm/i915: introduce GEM_WARN_ONMatthew Auld1-0/+2
In a similar spirit to GEM_BUG_ON we now also have GEM_WARN_ON, with the simple goal of expressing warnings which are truly insane, and so are only really useful for CI where we have some abusive tests. v2: - use BUILD_BUG_ON_INVALID for !DEBUG_GEM - clarify commit message Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161213203222.32564-1-matthew.auld@intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-12-05drm/i915: allow GEM_BUG_ON expr checking with !DEBUG_GEMMatthew Auld1-1/+1
Use BUILD_BUG_ON_INVALID(expr) in GEM_BUG_ON when building without DEBUG_GEM. This means the compiler can now check the validity of expr without generating any code, in turn preventing us from inadvertently breaking the build when DEBUG_GEM is not enabled. Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161202184750.3843-1-matthew.auld@intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-11-08drm/i915: avoid harmless empty-body warningArnd Bergmann1-1/+1
The newly added assert_kernel_context_is_current introduces a warning when built with W=1: drivers/gpu/drm/i915/i915_gem.c: In function ‘assert_kernel_context_is_current’: drivers/gpu/drm/i915/i915_gem.c:4417:63: error: suggest braces around empty body in an ‘else’ statement [-Werror=empty-body] Changing the GEM_BUG_ON() macro from an empty definition to "do { } while (0)" makes the macro more robust to use and avoids the warning. Fixes: 3033acab07f9 ("drm/i915: Queue the idling context switch after all other timelines") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20161108135834.2166677-1-arnd@arndb.de
2016-10-28drm/i915: Combine seqno + tracking into a global timeline structChris Wilson1-0/+2
Our timelines are more than just a seqno. They also provide an ordered list of requests to be executed. Due to the restriction of handling individual address spaces, we are limited to a timeline per address space but we use a fence context per engine within. Our first step to introducing independent timelines per context (i.e. to allow each context to have a queue of requests to execute that have a defined set of dependencies on other requests) is to provide a timeline abstraction for the global execution queue. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-23-chris@chris-wilson.co.uk
2016-04-14drm/i915: Add GEM debugging Kconfig optionChris Wilson1-0/+34
Currently there is a #define to enable extra BUG_ON for debugging requests and associated activities. I want to expand its use to cover all of GEM internals (so that we can saturate the code with asserts). We can add a Kconfig option to make it easier to enable - with the usual caveats of not enabling unless explicitly requested. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-3-git-send-email-chris@chris-wilson.co.uk