summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/i915/i915_request.c
AgeCommit message (Collapse)AuthorFilesLines
2019-02-26drm/i915: Remove i915_request.global_seqnoChris Wilson1-29/+5
Having weaned the interrupt handling off using a single global execution queue, we no longer need to emit a global_seqno. Note that we still have a few assumptions about execution order along engine timelines, but this removes the most obvious artefact! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190226094922.31617-3-chris@chris-wilson.co.uk
2019-02-26drm/i915: Remove access to global seqno in the HWSPChris Wilson1-17/+10
Stop accessing the HWSP to read the global seqno, and stop tracking the mirror in the engine's execution timeline -- it is unused. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190226094922.31617-2-chris@chris-wilson.co.uk
2019-02-20drm/i915: Beware temporary wedging when determining -EIOChris Wilson1-2/+3
At a few points in our uABI, we check to see if the driver is wedged and report -EIO back to the user in that case. However, as we perform the check and reset asynchronously (where once before they were both serialised by the struct_mutex), we may instead see the temporary wedging used to cancel inflight rendering to avoid a deadlock during reset (caused by either us timing out in our reset handler, i915_wedge_on_timeout or with malice aforethought in intel_reset_prepare for a stuck modeset). If we suspect this is the case, that is we see a wedged driver *and* reset in progress, then wait until the reset is resolved before reporting upon the wedged status. v2: might_sleep() (Mika) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109580 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190220145637.23503-1-chris@chris-wilson.co.uk
2019-02-19drm/i915: Use time based guilty context banningChris Wilson1-2/+0
Currently, we accumulate each time a context hangs the GPU, offset against the number of requests it submits, and if that score exceeds a certain threshold, we ban that context from submitting any more requests (cancelling any work in flight). In contrast, we use a simple timer on the file, that if we see more than a 9 hangs faster than 60s apart in total across all of its contexts, we will ban the client from creating any more contexts. This leads to a confusing situation where the file may be banned before the context, so lets use a simple timer scheme for each. If the context submits 3 hanging requests within a 120s period, declare it forbidden to ever send more requests. This has the advantage of not being easy to repair by simply sending empty requests, but has the disadvantage that if the context is idle then it is forgiven. However, if the context is idle, it is not disrupting the system, but a hog can evade the request counting and cause much more severe disruption to the system. Updating ban_score from request retirement is dubious as the retirement is purposely not in sync with request submission (i.e. we try and batch retirement to reduce overhead and avoid latency on submission), which leads to surprising situations where we can forgive a hang immediately due to a backlog of requests from before the hang being retired afterwards. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190219122215.8941-2-chris@chris-wilson.co.uk
2019-02-15drm/i915: Defer application of request banning to submissionChris Wilson1-0/+3
As we currently do not check on submission whether the context is banned in a timely manner it is possible for some requests to escape cancellation after their parent context is banned. By moving the ban into the request submission under the engine->timeline.lock, we serialise it with the reset and setting of the context ban. References: eb8d0f5af4ec ("drm/i915: Remove GPU reset dependence on struct_mutex") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190213182737.12695-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2019-02-13drm/i915: Apply rps waitboosting for dma_fence_wait_timeout()Chris Wilson1-2/+19
As time goes by, usage of generic ioctls such as drm_syncobj and sync_file are on the increase bypassing i915-specific ioctls like GEM_WAIT. Currently, we only apply waitboosting to our driver ioctls as we track the file/client and account the waitboosting to them. However, since commit 7b92c1bd0540 ("drm/i915: Avoid keeping waitboost active for signaling threads"), we no longer have been applying the client ratelimiting on waitboosts and so that information has only been used for debug tracking. Push the application of waitboosting down to the common i915_request_wait, and apply it to all foreign fence waits as well. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Eero Tamminen <eero.t.tamminen@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190213092504.25709-1-chris@chris-wilson.co.uk
2019-02-05drm/i915: Pull i915_gem_active into the i915_active familyChris Wilson1-24/+11
Looking forward, we need to break the struct_mutex dependency on i915_gem_active. In the meantime, external use of i915_gem_active is quite beguiling, little do new users suspect that it implies a barrier as each request it tracks must be ordered wrt the previous one. As one of many, it can be used to track activity across multiple timelines, a shared fence, which fits our unordered request submission much better. We need to steer external users away from the singular, exclusive fence imposed by i915_gem_active to i915_active instead. As part of that process, we move i915_gem_active out of i915_request.c into i915_active.c to start separating the two concepts, and rename it to i915_active_request (both to tie it to the concept of tracking just one request, and to give it a longer, less appealing name). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190205130005.2807-5-chris@chris-wilson.co.uk
2019-02-05drm/i915: Add timeline barrier supportTvrtko Ursulin1-0/+17
Timeline barrier allows serialization between different timelines. After calling i915_timeline_set_barrier with a request, all following submissions on this timeline will be set up as depending on this request, or barrier. Once the barrier has been completed it automatically gets cleared and things continue as normal. This facility will be used by the upcoming context SSEU code. v2: * Assert barrier has been retired on timeline_fini. (Chris Wilson) * Fix mock_timeline. v3: * Improved comment language. (Chris Wilson) v4: * Maintain ordering with previous barriers set on the timeline. v5: * Rebase. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190205095032.22673-3-tvrtko.ursulin@linux.intel.com
2019-02-05drm/i915: Trim NEWCLIENT boostingChris Wilson1-1/+1
Limit the NEWCLIENT boost to only give its small priority boost to fresh clients only that have no dependencies. The idea for using NEWCLIENT boosting, commit b16c765122f9 ("drm/i915: Priority boost for new clients"), is that short-lived streams are often interactive and require lower latency -- and that by executing those ahead of the long running hogs, the short-lived clients do little to interfere with the system throughput by virtue of their short-lived nature. However, we were only considering the client's own timeline for determining whether or not it was a fresh stream. This allowed for compositors to wake up before their vblank and bump all of its client streams. However, in testing with media-bench this results in chaining all cooperating contexts together preventing us from being able to reorder contexts to reduce bubbles (pipeline stalls), overall increasing latency, and reducing system throughput. The exact opposite of our intent. The compromise of applying the NEWCLIENT boost to strictly fresh clients (that do not wait upon anything else) should maintain the "real-time response under load" characteristics of FQ_CODEL, without locking together the long chains of dependencies across the system. References: b16c765122f9 ("drm/i915: Priority boost for new clients") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190204150101.30759-1-chris@chris-wilson.co.uk
2019-01-30drm/i915: Replace global breadcrumbs with per-context interrupt trackingChris Wilson1-93/+49
A few years ago, see commit 688e6c725816 ("drm/i915: Slaughter the thundering i915_wait_request herd"), the issue of handling multiple clients waiting in parallel was brought to our attention. The requirement was that every client should be woken immediately upon its request being signaled, without incurring any cpu overhead. To handle certain fragility of our hw meant that we could not do a simple check inside the irq handler (some generations required almost unbounded delays before we could be sure of seqno coherency) and so request completion checking required delegation. Before commit 688e6c725816, the solution was simple. Every client waiting on a request would be woken on every interrupt and each would do a heavyweight check to see if their request was complete. Commit 688e6c725816 introduced an rbtree so that only the earliest waiter on the global timeline would woken, and would wake the next and so on. (Along with various complications to handle requests being reordered along the global timeline, and also a requirement for kthread to provide a delegate for fence signaling that had no process context.) The global rbtree depends on knowing the execution timeline (and global seqno). Without knowing that order, we must instead check all contexts queued to the HW to see which may have advanced. We trim that list by only checking queued contexts that are being waited on, but still we keep a list of all active contexts and their active signalers that we inspect from inside the irq handler. By moving the waiters onto the fence signal list, we can combine the client wakeup with the dma_fence signaling (a dramatic reduction in complexity, but does require the HW being coherent, the seqno must be visible from the cpu before the interrupt is raised - we keep a timer backup just in case). Having previously fixed all the issues with irq-seqno serialisation (by inserting delays onto the GPU after each request instead of random delays on the CPU after each interrupt), we can rely on the seqno state to perfom direct wakeups from the interrupt handler. This allows us to preserve our single context switch behaviour of the current routine, with the only downside that we lose the RT priority sorting of wakeups. In general, direct wakeup latency of multiple clients is about the same (about 10% better in most cases) with a reduction in total CPU time spent in the waiter (about 20-50% depending on gen). Average herd behaviour is improved, but at the cost of not delegating wakeups on task_prio. v2: Capture fence signaling state for error state and add comments to warm even the most cold of hearts. v3: Check if the request is still active before busywaiting v4: Reduce the amount of pointer misdirection with list_for_each_safe and using a local i915_request variable inside the loops v5: Add a missing pluralisation to a purely informative selftest message. References: 688e6c725816 ("drm/i915: Slaughter the thundering i915_wait_request herd") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190129205230.19056-2-chris@chris-wilson.co.uk
2019-01-29drm/i915: Identify active requestsChris Wilson1-5/+5
To allow requests to forgo a common execution timeline, one question we need to be able to answer is "is this request running?". To track whether a request has started on HW, we can emit a breadcrumb at the beginning of the request and check its timeline's HWSP to see if the breadcrumb has advanced past the start of this request. (This is in contrast to the global timeline where we need only ask if we are on the global timeline and if the timeline has advanced past the end of the previous request.) There is still confusion from a preempted request, which has already started but relinquished the HW to a high priority request. For the common case, this discrepancy should be negligible. However, for identification of hung requests, knowing which one was running at the time of the hang will be much more important. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190129185452.20989-2-chris@chris-wilson.co.uk
2019-01-28drm/i915: Track the context's seqno in its own timeline HWSPChris Wilson1-1/+2
Now that we have allocated ourselves a cacheline to store a breadcrumb, we can emit a write from the GPU into the timeline's HWSP of the per-context seqno as we complete each request. This drops the mirroring of the per-engine HWSP and allows each context to operate independently. We do not need to unwind the per-context timeline, and so requests are always consistent with the timeline breadcrumb, greatly simplifying the completion checks as we no longer need to be concerned about the global_seqno changing mid check. One complication though is that we have to be wary that the request may outlive the HWSP and so avoid touching the potentially danging pointer after we have retired the fence. We also have to guard our access of the HWSP with RCU, the release of the obj->mm.pages should already be RCU-safe. At this point, we are emitting both per-context and global seqno and still using the single per-engine execution timeline for resolving interrupts. v2: s/fake_complete/mark_complete/ Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190128181812.22804-5-chris@chris-wilson.co.uk
2019-01-28drm/i915: Introduce concept of per-timeline (context) HWSPChris Wilson1-5/+11
Supplement the per-engine HWSP with a per-timeline HWSP. That is a per-request pointer through which we can check a local seqno, abstracting away the presumption of a global seqno. In this first step, we point each request back into the engine's HWSP so everything continues to work with the global timeline. v2: s/i915_request_hwsp/hwsp_seqno/ to emphasis that this is the current HW value and that we are accessing it via i915_request merely as a convenience. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190128181812.22804-1-chris@chris-wilson.co.uk
2019-01-25drm/i915: Remove GPU reset dependence on struct_mutexChris Wilson1-47/+0
Now that the submission backends are controlled via their own spinlocks, with a wave of a magic wand we can lift the struct_mutex requirement around GPU reset. That is we allow the submission frontend (userspace) to keep on submitting while we process the GPU reset as we can suspend the backend independently. The major change is around the backoff/handoff strategy for performing the reset. With no mutex deadlock, we no longer have to coordinate with any waiter, and just perform the reset immediately. Testcase: igt/gem_mmap_gtt/hang # regresses Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190125132230.22221-3-chris@chris-wilson.co.uk
2019-01-25drm/i915: Remove manual breadcumb countingChris Wilson1-2/+2
Now that we know we measure the size of the engine->emit_breadcrumb() correctly, we can remove the previous manual counting. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190125120005.25191-1-chris@chris-wilson.co.uk
2019-01-23Merge drm/drm-next into drm-intel-next-queuedRodrigo Vivi1-6/+6
We need avi infoframe stuff who got merged via drm-misc Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2019-01-22drm/i915: Tidy common test_bit probing of i915_request->fence.flagsChris Wilson1-1/+1
A repeated pattern is to test the signaled bit of our request->fence.flags. Make this an inline to shorten a few lines and remove unnecessary line continuations. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190121222117.23305-20-chris@chris-wilson.co.uk
2019-01-21drm/i915: Prevent use of global_seqno=0Chris Wilson1-1/+8
We are not allowed to assign rq->global_seqno=0 as it has a special meaning of "inactive" (not executing on HW). Fixes: 6faf5916e6be ("drm/i915: Remove HW semaphores for gen7 inter-engine synchronisation") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190119143024.26971-1-chris@chris-wilson.co.uk
2019-01-17drm/i915: Pull all the reset functionality together into i915_reset.cChris Wilson1-0/+1
Currently the code to reset the GPU and our state is spread widely across a few files. Pull the logic together into a common file. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190116153304.787-1-chris@chris-wilson.co.uk
2019-01-10drm/i915: Reduce i915_request_alloc retirement to local contextChris Wilson1-22/+33
In the continual quest to reduce the amount of global work required when submitting requests, replace i915_retire_requests() after allocation failure to retiring just our ring. v2: Don't forget the list iteration included an early break, so we would never throttle on the last request in the ring/timeline. v3: Use the common ring_retire_requests() References: 11abf0c5a021 ("drm/i915: Limit the backpressure for i915_request allocation") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190109215932.26454-1-chris@chris-wilson.co.uk
2019-01-09Merge tag 'drm-misc-next-2019-01-07-1' of ↵Dave Airlie1-6/+6
git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for 5.1: UAPI Changes: Cross-subsystem Changes: - Turn dma-buf fence sequence numbers into 64 bit numbers Core Changes: - Move to a common helper for the DP MST hotplug for radeon, i915 and amdgpu - i2c improvements for drm_dp_mst - Removal of drm_syncobj_cb - Introduction of an helper to create and attach the TV margin properties Driver Changes: - Improve cache flushes for v3d - Reflection support for vc4 - HDMI overscan support for vc4 - Add implicit fencing support for rockchip and sun4i - Switch to generic fbdev emulation for virtio Signed-off-by: Dave Airlie <airlied@redhat.com> [airlied: applied amdgpu merge fixup] From: Maxime Ripard <maxime.ripard@bootlin.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190107180333.amklwycudbsub3s5@flea
2018-12-31drm/i915: Drop unused engine->irq_seqno_barrier w/aChris Wilson1-7/+1
Now that we have eliminated the CPU-side irq_seqno_barrier by moving the delays on the GPU before emitting the MI_USER_INTERRUPT, we can remove the engine->irq_seqno_barrier infrastructure. Though intentionally slowing down the GPU is nasty, so is the code we can now remove! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181228171641.16531-6-chris@chris-wilson.co.uk
2018-12-31drm/i915: Remove redundant trailing request flushChris Wilson1-7/+7
Now that we perform the request flushing inline with emitting the breadcrumb, we can remove the now redundant manual flush. And we can also remove the infrastructure that remained only for its purpose. v2: emit_breadcrumb_sz is in dwords, but rq->reserved_space is in bytes Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181228171641.16531-1-chris@chris-wilson.co.uk
2018-12-28drm/i915: Remove HW semaphores for gen7 inter-engine synchronisationChris Wilson1-120/+6
The writing is on the wall for the existence of a single execution queue along each engine, and as a consequence we will not be able to track dependencies along the HW queue itself, i.e. we will not be able to use HW semaphores on gen7 as they use a global set of registers (and unlike gen8+ we can not effectively target memory to keep per-context seqno and dependencies). On the positive side, when we implement request reordering for gen7 we also can not presume a simple execution queue and would also require removing the current semaphore generation code. So this bring us another step closer to request reordering for ringbuffer submission! The negative side is that using interrupts to drive inter-engine synchronisation is much slower (4us -> 15us to do a nop on each of the 3 engines on ivb). This is much better than it was at the time of introducing the HW semaphores and equally important userspace weaned itself off intermixing dependent BLT/RENDER operations (the prime culprit was glyph rendering in UXA). So while we regress the microbenchmarks, it should not impact the user. References: https://bugs.freedesktop.org/show_bug.cgi?id=108888 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181228140736.32606-2-chris@chris-wilson.co.uk
2018-12-07drm/i915: Compile fix for 64b dma-fence seqnoMika Kuoppala1-6/+6
Many errs of the form: drivers/gpu/drm/i915/selftests/intel_hangcheck.c: In function ‘__igt_reset_evict_vma’: ./include/linux/kern_levels.h:5:18: error: format ‘%x’ expects argument of type ‘unsigned int’, but argum Fixes: b312d8ca3a7c ("dma-buf: make fence sequence numbers 64 bit v2") Cc: Christian König <christian.koenig@amd.com> Cc: Chunming Zhou <david1.zhou@amd.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20181207123428.16257-1-mika.kuoppala@linux.intel.com
2018-12-07drm/i915: Push EMIT_INVALIDATE at request start to backendsChris Wilson1-5/+0
Move the common engine->emit_flush(EMIT_INVALIDATE) back to the backends (where it was once previously) as we seek to specialise it in future patches. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181207090213.14352-1-chris@chris-wilson.co.uk
2018-11-27drm/i915: Skip engine serialisation for no-op seqno resetChris Wilson1-0/+3
If the engine's seqno is already at our target seqno (most likely it hasn't been used since the last reset), we can skip serialising the engine and leave it as is. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181126095610.20962-1-chris@chris-wilson.co.uk
2018-10-26drm/i915: Park signaling thread while wrapping the seqnoChris Wilson1-0/+4
A danger encountered when resetting the seqno (using debugfs/i915_next_seqno) is that as we change the breadcrumb stored in the HWSP, it may be inspected by the signaler thread leading to confusion in our sanity checks. <0> [136.331342] i915/sig-347 3..s1 136336154us : execlists_submission_tasklet: rcs0 awake?=1, active=5 <0> [136.331373] i915/sig-347 3d.s2 136336155us : process_csb: rcs0 cs-irq head=5, tail=0 <0> [136.331402] i915/sig-347 3d.s2 136336155us : process_csb: rcs0 csb[0]: status=0x00000018:0x00000002, active=0x5 <0> [136.331434] i915/sig-347 3d.s2 136336156us : process_csb: rcs0 out[0]: ctx=2.1, global=219 (fence 46:8455) (current 219), prio=0 <0> [136.331466] i915/sig-347 3d.s2 136336156us : process_csb: rcs0 completed ctx=2 <0> [136.332027] gem_exec-1049 0.... 136336246us : reset_all_global_seqno.part.5: rcs0 seqno 219 (current 219) -> -43 <0> [136.332056] gem_exec-1049 0.... 136336251us : reset_all_global_seqno.part.5: bcs0 seqno 183 (current 183) -> -43 <0> [136.332085] gem_exec-1049 0.... 136336255us : reset_all_global_seqno.part.5: vcs0 seqno 191 (current 191) -> -43 <0> [136.332114] gem_exec-1049 0.... 136336259us : reset_all_global_seqno.part.5: vcs1 seqno 180 (current 180) -> -43 <0> [136.332143] gem_exec-1049 0.... 136336262us : reset_all_global_seqno.part.5: vecs0 seqno 212 (current 212) -> -43 <0> [136.332174] i915/sig-347 3.... 136336280us : intel_breadcrumbs_signaler: intel_breadcrumbs_signaler:673 GEM_BUG_ON(!i915_request_completed(rq)) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181024104939.2861-1-chris@chris-wilson.co.uk
2018-10-05drm/i915: Remove the global cache shrink & rcu barrier on allocation failureChris Wilson1-11/+0
Earlier, we reasoned that having idled the gpu under mempressure, that would be a good time to trim our request slabs in order to perform the next request allocation. We have stopped performing the global operation on the device (no idling) and wish to make the allocation failure handling more local, so out with the global barrier that may take a long time. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181005080300.9908-2-chris@chris-wilson.co.uk
2018-10-01drm/i915: Priority boost for waiting clientsChris Wilson1-0/+2
Latency is in the eye of the beholder. In the case where a client stops and waits for the gpu, give that request chain a small priority boost (not so that it overtakes higher priority clients, to preserve the external ordering) so that ideally the wait completes earlier. v2: Tvrtko recommends to keep the boost-from-user-stall as small as possible and to allow new client flows to be preferred for interactivity over stalls. Testcase: igt/gem_sync/switch-default Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181001144755.7978-3-chris@chris-wilson.co.uk
2018-10-01drm/i915: Pull scheduling under standalone lockChris Wilson1-85/+0
Currently, the backend scheduling code abuses struct_mutex into order to have a global lock to manipulate a temporary list (without widespread allocation) and to protect against list modifications. This is an extraneous coupling to struct_mutex and further can not extend beyond the local device. Pull all the code that needs to be under the one true lock into i915_scheduler.c, and make it so. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181001144755.7978-2-chris@chris-wilson.co.uk
2018-10-01drm/i915: Priority boost for new clientsChris Wilson1-2/+14
Taken from an idea used for FQ_CODEL, we give the first request of a new request flows a small priority boost. These flows are likely to correspond with short, interactive tasks and so be more latency sensitive than the longer free running queues. As soon as the client has more than one request in the queue, further requests are not boosted and it settles down into ordinary steady state behaviour. Such small kicks dramatically help combat the starvation issue, by allowing each client the opportunity to run even when the system is under heavy throughput load (within the constraints of the user selected priority). v2: Mark the preempted request as the start of a new flow, to prevent a single client being continually gazumped by its peers. Testcase: igt/benchmarks/rrul Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181001144755.7978-1-chris@chris-wilson.co.uk
2018-09-14drm/i915: Limit the backpressure for i915_request allocationChris Wilson1-6/+8
If we try and fail to allocate a i915_request, we apply some backpressure on the clients to throttle the memory allocations coming from i915.ko. Currently, we wait until completely idle, but this is far too heavy and leads to some situations where the only escape is to declare a client hung and reset the GPU. The intent is to only ratelimit the allocation requests and to allow ourselves to recycle requests and memory from any long queues built up by a client hog. Although the system memory is inherently a global resources, we don't want to overly penalize an unlucky client to pay the price of reaping a hog. To reduce the influence of one client on another, we can instead of waiting for the entire GPU to idle, impose a barrier on the local client. (One end goal for request allocation is for scalability to many concurrent allocators; simultaneous execbufs.) To prevent ourselves from getting caught out by long running requests (requests that may never finish without userspace intervention, whom we are blocking) we need to impose a finite timeout, ideally shorter than hangcheck. A long time ago Paul McKenney suggested that RCU users should ratelimit themselves using judicious use of cond_synchronize_rcu(). This gives us the opportunity to reduce our indefinite wait for the GPU to idle to a wait for the RCU grace period of the previous allocation along this timeline to expire, satisfying both the local and finite properties we desire for our ratelimiting. There are still a few global steps (reclaim not least amongst those!) when we exhaust the immediate slab pool, at least now the wait is itself decoupled from struct_mutex for our glorious highly parallel future! Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106680 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180914080017.30308-1-chris@chris-wilson.co.uk
2018-08-07drm/i915: Pull seqno started checks togetherChris Wilson1-5/+4
We have a few instances of checking seqno-1 to see if the HW has started the request. Pull those together under a helper. v2: Pull the !seqno assertion higher, as given seqno==1 we may indeed check to see if we have started using seqno==0. Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180806112605.20725-1-chris@chris-wilson.co.uk
2018-07-09drm/i915: Provide a timeout to i915_gem_wait_for_idle()Chris Wilson1-2/+4
Usually we have no idea about the upper bound we need to wait to catch up with userspace when idling the device, but in a few situations we know the system was idle beforehand and can provide a short timeout in order to very quickly catch a failure, long before hangcheck kicks in. In the following patches, we will use the timeout to curtain two overly long waits, where we know we can expect the GPU to complete within a reasonable time or declare it broken. In particular, with a broken GPU we expect it to fail during the initial GPU setup where do a couple of context switches to record the defaults. This is a task that takes a few milliseconds even on the slowest of devices, but we may have to wait 60s for hangcheck to give in and declare the machine inoperable. In this a case where any gpu hang is unacceptable, both from a timeliness and practical standpoint. The other improvement is that in selftests, we do not need to arm an independent timer to inject a wedge, as we can just limit the timeout on the wait directly. v2: Include the timeout parameter in the trace. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180709122044.7028-1-chris@chris-wilson.co.uk
2018-07-07drm/i915: Replace nested subclassing with explicit subclassesChris Wilson1-1/+1
In the next patch, we will want a third distinct class of timeline that may overlap with the current pair of client and engine timeline classes. Rather than use the ad hoc markup of SINGLE_DEPTH_NESTING, initialise the different timeline classes with an explicit subclass. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180706210710.16251-1-chris@chris-wilson.co.uk
2018-07-06drm/i915: Export i915_request_skip()Chris Wilson1-0/+21
In the next patch, we will want to start skipping requests on failing to complete their payloads. So export the utility function current used to make requests inoperable following a failed gpu reset. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180706103947.15919-2-chris@chris-wilson.co.uk
2018-06-28drm/i915: Only signal from interrupt when requestedChris Wilson1-1/+1
Avoid calling dma_fence_signal() from inside the interrupt if we haven't enabled signaling on the request. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180627201304.15817-4-chris@chris-wilson.co.uk
2018-06-28drm/i915: Move the irq_counter inside the spinlockChris Wilson1-2/+2
Rather than have multiple locked instructions inside the notify_ring() irq handler, move them inside the spinlock and reduce their intrinsic locking. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180627201304.15817-3-chris@chris-wilson.co.uk
2018-06-14drm/i915: Make closing request flush mandatoryChris Wilson1-16/+2
For symmetry, simplicity and ensuring the request is always truly idle upon its completion, always emit the closing flush prior to emitting the request breadcrumb. Previously, we would only emit the flush if we had started a user batch, but this just leaves all the other paths open to speculation (do they affect the GPU caches or not?) With mm switching, a key requirement is that the GPU is flushed and invalidated before hand, so for absolute safety, we want that closing flush be mandatory. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180612105135.4459-1-chris@chris-wilson.co.uk
2018-06-11drm/i915/ringbuffer: Fix context restore upon resetChris Wilson1-0/+2
The discovery with trying to enable full-ppgtt was that we were completely failing to the load both the mm and context following the reset. Although we were performing mmio to set the PP_DIR (per-process GTT) and CCID (context), these were taking no effect (the assumption was that this would trigger reload of the context and restore the page tables). It was not until we performed the LRI + MI_SET_CONTEXT in a following context switch would anything occur. Since we are then required to reset the context image and PP_DIR using CS commands, we place those commands into every batch. The hardware should recognise the no-ops and eliminate the expensive context loads, but we still have to pay the cost of using cross-powerwell register writes. In practice, this has no effect on actual context switch times, and only adds a few hundred nanoseconds to no-op switches. We can improve the latter by eliminating the w/a around known no-op switches, but there is an ulterior motive to keeping them. Always emitting the context switch at the beginning of the request (and relying on HW to skip unneeded switches) does have one key advantage. Should we implement request reordering on Haswell, we will not know in advance what the previous executing context was on the GPU and so we would not be able to elide the MI_SET_CONTEXT commands ourselves and always have to emit them. Having our hand forced now actually prepares us for later. Now since that context and mm follow the request, we no longer (and not for a long time since requests took over!) require a trace point to tell when we write the switch into the ring, since it is always. (This is even more important when you remember that simply writing into the ring bears no relation to the current mm.) v2: Sandybridge has to agree to use LRI as well. Testcase: igt/drv_selftests/live_hangcheck Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Matthew Auld <matthew.william.auld@gmail.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180611110845.31890-1-chris@chris-wilson.co.uk
2018-05-24drm/i915: Look for an active kernel context before switchingChris Wilson1-1/+4
We were not very carefully checking to see if an older request on the engine was an earlier switch-to-kernel-context before deciding to emit a new switch. The end result would be that we could get into a permanent loop of trying to emit a new request to perform the switch simply to flush the existing switch. What we need is a means of tracking the completion of each timeline versus the kernel context, that is to detect if a more recent request has been submitted that would result in a switch away from the kernel context. To realise this, we need only to look in our syncmap on the kernel context and check that we have synchronized against all active rings. v2: Since all ringbuffer clients currently share the same timeline, we do have to use the gem_context to distinguish clients. As a bonus, include all the tracing used to debug the death inside suspend. v3: Test, test, test. Construct a selftest to exercise and assert the expected behaviour that multiple switch-to-contexts do not emit redundant requests. Reported-by: Mika Kuoppala <mika.kuoppala@intel.com> Fixes: a89d1f921c15 ("drm/i915: Split i915_gem_timeline into individual timelines") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180524081135.15278-1-chris@chris-wilson.co.uk
2018-05-18drm/i915: Store a pointer to intel_context in i915_requestChris Wilson1-17/+17
To ease the frequent and ugly pointer dance of &request->gem_context->engine[request->engine->id] during request submission, store that pointer as request->hw_context. One major advantage that we will exploit later is that this decouples the logical context state from the engine itself. v2: Set mock_context->ops so we don't crash and burn in selftests. Cleanups from Tvrtko. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Acked-by: Zhenyu Wang <zhenyuw@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180517212633.24934-3-chris@chris-wilson.co.uk
2018-05-18drm/i915: Move request->ctx asideChris Wilson1-6/+6
In the next patch, we want to store the intel_context pointer inside i915_request, as it is frequently access via a convoluted dance when submitting the request to hw. Having two context pointers inside i915_request leads to confusion so first rename the existing i915_gem_context pointer to i915_request.gem_context. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180517212633.24934-1-chris@chris-wilson.co.uk
2018-05-08drm/i915: Annotate timeline lock nestingChris Wilson1-1/+1
CI noticed <4>[ 23.430701] ============================================ <4>[ 23.430706] WARNING: possible recursive locking detected <4>[ 23.430713] 4.17.0-rc4-CI-CI_DRM_4156+ #1 Not tainted <4>[ 23.430720] -------------------------------------------- <4>[ 23.430725] systemd-udevd/169 is trying to acquire lock: <4>[ 23.430732] (ptrval) (&(&timeline->lock)->rlock){....}, at: move_to_timeline+0x48/0x12c [i915] <4>[ 23.430888] but task is already holding lock: <4>[ 23.430894] (ptrval) (&(&timeline->lock)->rlock){....}, at: i915_request_submit+0x1a/0x40 [i915] <4>[ 23.430995] other info that might help us debug this: <4>[ 23.431002] Possible unsafe locking scenario: <4>[ 23.431007] CPU0 <4>[ 23.431010] ---- <4>[ 23.431013] lock(&(&timeline->lock)->rlock); <4>[ 23.431021] lock(&(&timeline->lock)->rlock); <4>[ 23.431028] *** DEADLOCK *** <4>[ 23.431036] May be due to missing lock nesting notation <4>[ 23.431044] 5 locks held by systemd-udevd/169: <4>[ 23.431049] #0: (ptrval) (&dev->mutex){....}, at: __driver_attach+0x42/0xe0 <4>[ 23.431065] #1: (ptrval) (&dev->mutex){....}, at: __driver_attach+0x50/0xe0 <4>[ 23.431078] #2: (ptrval) (&dev->struct_mutex){+.+.}, at: i915_gem_init+0xca/0x630 [i915] <4>[ 23.431174] #3: (ptrval) (rcu_read_lock){....}, at: submit_notify+0x35/0x124 [i915] <4>[ 23.431271] #4: (ptrval) (&(&timeline->lock)->rlock){....}, at: i915_request_submit+0x1a/0x40 [i915] <4>[ 23.431369] stack backtrace: <4>[ 23.431377] CPU: 0 PID: 169 Comm: systemd-udevd Not tainted 4.17.0-rc4-CI-CI_DRM_4156+ #1 <4>[ 23.431385] Hardware name: Dell Inc. OptiPlex GX280 /0G8310, BIOS A04 02/09/2005 <4>[ 23.431394] Call Trace: <4>[ 23.431403] dump_stack+0x67/0x9b <4>[ 23.431411] __lock_acquire+0xc67/0x1b50 <4>[ 23.431421] ? ring_buffer_lock_reserve+0x154/0x3f0 <4>[ 23.431429] ? lock_acquire+0xa6/0x210 <4>[ 23.431435] lock_acquire+0xa6/0x210 <4>[ 23.431530] ? move_to_timeline+0x48/0x12c [i915] <4>[ 23.431540] _raw_spin_lock+0x2a/0x40 <4>[ 23.431634] ? move_to_timeline+0x48/0x12c [i915] <4>[ 23.431730] move_to_timeline+0x48/0x12c [i915] <4>[ 23.431826] __i915_request_submit+0xfa/0x280 [i915] <4>[ 23.431923] i915_request_submit+0x25/0x40 [i915] <4>[ 23.432024] i9xx_submit_request+0x11/0x140 [i915] <4>[ 23.432120] submit_notify+0x8d/0x124 [i915] <4>[ 23.432202] __i915_sw_fence_complete+0x81/0x250 [i915] <4>[ 23.432300] __i915_request_add+0x31c/0x7c0 [i915] <4>[ 23.432395] i915_gem_init+0x621/0x630 [i915] <4>[ 23.432476] i915_driver_load+0xbee/0x10b0 [i915] <4>[ 23.432485] ? trace_hardirqs_on_caller+0xe0/0x1b0 <4>[ 23.432566] i915_pci_probe+0x29/0x90 [i915] <4>[ 23.432574] pci_device_probe+0xa1/0x130 <4>[ 23.432582] driver_probe_device+0x306/0x480 <4>[ 23.432589] __driver_attach+0xb7/0xe0 <4>[ 23.432596] ? driver_probe_device+0x480/0x480 <4>[ 23.432602] ? driver_probe_device+0x480/0x480 <4>[ 23.432609] bus_for_each_dev+0x74/0xc0 <4>[ 23.432616] bus_add_driver+0x15f/0x250 <4>[ 23.432623] ? 0xffffffffa02d7000 <4>[ 23.432629] driver_register+0x52/0xc0 <4>[ 23.432635] ? 0xffffffffa02d7000 <4>[ 23.432642] do_one_initcall+0x58/0x370 <4>[ 23.432653] ? do_init_module+0x1d/0x1ea <4>[ 23.432660] ? rcu_read_lock_sched_held+0x6f/0x80 <4>[ 23.432667] ? kmem_cache_alloc_trace+0x282/0x2e0 <4>[ 23.432675] do_init_module+0x56/0x1ea <4>[ 23.432682] load_module+0x2435/0x2b20 <4>[ 23.432694] ? __se_sys_finit_module+0xd3/0xf0 <4>[ 23.432701] __se_sys_finit_module+0xd3/0xf0 <4>[ 23.432710] do_syscall_64+0x55/0x190 <4>[ 23.432717] entry_SYSCALL_64_after_hwframe+0x49/0xbe <4>[ 23.432724] RIP: 0033:0x7fa780782839 <4>[ 23.432729] RSP: 002b:00007ffcea73e668 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 <4>[ 23.432738] RAX: ffffffffffffffda RBX: 0000561a472a4b30 RCX: 00007fa780782839 <4>[ 23.432745] RDX: 0000000000000000 RSI: 00007fa7804610e5 RDI: 000000000000000e <4>[ 23.432752] RBP: 00007fa7804610e5 R08: 0000000000000000 R09: 00007ffcea73e780 <4>[ 23.432758] R10: 000000000000000e R11: 0000000000000246 R12: 0000000000000000 <4>[ 23.432765] R13: 0000561a47296450 R14: 0000000000020000 R15: 0000561a472a4b30 but did not report it as an issue as it only occurred during the first module on boot. This is due to the removal of the distinct global timeline, and its separate lock class. So instead mark up the expected nesting. An alternative would be to define a separate lock class for the engine, but since we only expect to have a single point of nesting, we can avoid having multiple lock classes for the struct. Fixes: a89d1f921c15 ("drm/i915: Split i915_gem_timeline into individual timelines") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Tested-by: Michel Thierry <michel.thierry@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180508153514.20251-1-chris@chris-wilson.co.uk
2018-05-08drm/i915: Disable tasklet scheduling across initial schedulingChris Wilson1-3/+2
During request submission, we call the engine->schedule() function so that we may reorder the active requests as required for inheriting the new request's priority. This may schedule several tasklets to run on the local CPU, but we will need to schedule the tasklets again for the new request. Delay all the local tasklets until the end, so that we only have to process the queue just once. v2: Beware PREEMPT_RCU, as then local_bh_disable() is then not a superset of rcu_read_lock(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180507135731.10587-2-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2018-05-04drm/i915: Remove assertion of active_rings must be non-empty if active_requestsChris Wilson1-3/+0
"An outstanding request must still be on an active ring somewhere" is only true if we haven't just been interrupted by the shrinker in the middle of allocating the request itself. (At the start of i915_request_alloc() we pin the context and prepare the GT for activity, marking it as active, and then try to allocate the request. If this allocation invokes the shrinker, we try to reclaim some space by calling i915_retire_requests() which may then be confused by the pre-reservation of active_requests.) <3>[ 125.472695] i915_retire_requests:1429 GEM_BUG_ON(list_empty(&i915->gt.active_rings)) <2>[ 125.472792] kernel BUG at drivers/gpu/drm/i915/i915_request.c:1429! <4>[ 125.472822] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI <4>[ 125.498764] Modules linked in: snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel btusb btrtl btbcm btintel cdc_ether snd_hda_codec_realtek bluetooth i915 snd_hda_codec_generic usbnet r8152 mii ecdh_generic lpc_ich mei_me snd_hda_intel snd_hda_codec mei snd_hwdep snd_hda_core snd_pcm prime_numbers <4>[ 125.498923] CPU: 0 PID: 1115 Comm: gem_exec_create Tainted: G U 4.17.0-rc3-gc49cbe0d1eb8-kasan_32+ #1 <4>[ 125.498955] Hardware name: GOOGLE Peppy/Peppy, BIOS MrChromebox 02/04/2018 <4>[ 125.499074] RIP: 0010:i915_retire_requests+0x3f2/0x590 [i915] <4>[ 125.499095] RSP: 0018:ffff88004e5dec40 EFLAGS: 00010282 <4>[ 125.499117] RAX: 0000000000000010 RBX: ffff8800458f0000 RCX: 0000000000000000 <4>[ 125.499140] RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff880060c2f6f0 <4>[ 125.499164] RBP: ffff88004e5dee30 R08: ffffed000c185ee6 R09: ffffed000c185ee6 <4>[ 125.499187] R10: 0000000000000001 R11: ffffed000c185ee5 R12: ffff8800553da160 <4>[ 125.499210] R13: dffffc0000000000 R14: 0000000000000000 R15: ffff8800458faed0 <4>[ 125.499235] FS: 00007fe18f052980(0000) GS:ffff880065400000(0000) knlGS:0000000000000000 <4>[ 125.499262] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 125.499282] CR2: 00007f01df11efb8 CR3: 00000000518d4001 CR4: 00000000000606f0 <4>[ 125.499304] Call Trace: <4>[ 125.499417] i915_gem_shrink+0x576/0xb50 [i915] <4>[ 125.499532] ? i915_gem_shrinker_count+0x2f0/0x2f0 [i915] <4>[ 125.499561] ? trace_hardirqs_on_thunk+0x1a/0x1c <4>[ 125.499671] ? i915_gem_shrinker_count+0x1d6/0x2f0 [i915] <4>[ 125.499782] ? i915_gem_shrinker_scan+0xc4/0x320 [i915] <4>[ 125.499889] i915_gem_shrinker_scan+0xc4/0x320 [i915] <4>[ 125.499997] ? i915_gem_shrinker_vmap+0x3a0/0x3a0 [i915] <4>[ 125.500021] ? do_raw_spin_unlock+0x4f/0x240 <4>[ 125.500042] ? _raw_spin_unlock+0x29/0x40 <4>[ 125.500149] ? i915_gem_shrinker_count+0x1d6/0x2f0 [i915] <4>[ 125.500177] shrink_slab.part.18+0x23e/0x8f0 <4>[ 125.500202] ? unregister_shrinker+0x1f0/0x1f0 <4>[ 125.500226] ? mem_cgroup_iter+0x379/0xcc0 <4>[ 125.500249] shrink_node+0xa7e/0x1180 <4>[ 125.500276] ? shrink_node_memcg+0x11f0/0x11f0 <4>[ 125.500297] ? __delayacct_freepages_start+0x38/0x80 <4>[ 125.500319] ? __is_insn_slot_addr+0xe3/0x1a0 <4>[ 125.500342] ? recalibrate_cpu_khz+0x10/0x10 <4>[ 125.500361] ? ktime_get+0xb2/0x140 <4>[ 125.500382] do_try_to_free_pages+0x2d3/0xe40 <4>[ 125.500407] ? allow_direct_reclaim.part.23+0x1e0/0x1e0 <4>[ 125.500429] ? shrink_node+0x1180/0x1180 <4>[ 125.500450] ? __read_once_size_nocheck.constprop.4+0x10/0x10 <4>[ 125.500476] try_to_free_pages+0x1af/0x560 <4>[ 125.500497] ? do_try_to_free_pages+0xe40/0xe40 <4>[ 125.500525] __alloc_pages_nodemask+0xadc/0x2130 <4>[ 125.500553] ? gfp_pfmemalloc_allowed+0x150/0x150 <4>[ 125.500654] ? i915_gem_do_execbuffer+0x219d/0x32e0 [i915] <4>[ 125.500678] ? debug_check_no_locks_freed+0x2a0/0x2a0 <4>[ 125.500701] ? __debug_object_init+0x322/0xd90 <4>[ 125.500722] ? debug_check_no_locks_freed+0x2a0/0x2a0 <4>[ 125.500827] ? i915_gem_do_execbuffer+0xdc2/0x32e0 [i915] <4>[ 125.500942] ? i915_request_alloc+0x5b5/0x13f0 [i915] <4>[ 125.500964] ? page_frag_free+0x170/0x170 <4>[ 125.500984] ? debug_check_no_locks_freed+0x2a0/0x2a0 <4>[ 125.501008] new_slab+0x21d/0x5c0 <4>[ 125.501029] ___slab_alloc.constprop.35+0x322/0x3e0 <4>[ 125.501052] ? reservation_object_reserve_shared+0x10b/0x250 <4>[ 125.501074] ? __ww_mutex_lock.constprop.3+0x1104/0x2cf0 <4>[ 125.501097] ? _raw_spin_unlock_irqrestore+0x39/0x60 <4>[ 125.501120] ? fs_reclaim_acquire+0x10/0x10 <4>[ 125.501138] ? lock_acquire+0x138/0x3c0 <4>[ 125.501156] ? lock_acquire+0x3c0/0x3c0 <4>[ 125.501176] ? reservation_object_reserve_shared+0x10b/0x250 <4>[ 125.501198] ? __slab_alloc.isra.27.constprop.34+0x3d/0x70 <4>[ 125.501219] __slab_alloc.isra.27.constprop.34+0x3d/0x70 <4>[ 125.501243] ? reservation_object_reserve_shared+0x10b/0x250 <4>[ 125.501265] __kmalloc_track_caller+0x313/0x350 <4>[ 125.501287] krealloc+0x62/0xb0 <4>[ 125.501305] reservation_object_reserve_shared+0x10b/0x250 <4>[ 125.501411] i915_gem_do_execbuffer+0x2040/0x32e0 [i915] <4>[ 125.501522] ? eb_relocate_slow+0xad0/0xad0 [i915] <4>[ 125.501544] ? debug_check_no_locks_freed+0x2a0/0x2a0 <4>[ 125.501646] ? i915_gem_execbuffer2_ioctl+0x108/0x770 [i915] <4>[ 125.501755] ? i915_gem_execbuffer2_ioctl+0x108/0x770 [i915] <4>[ 125.501779] ? drm_dev_get+0x20/0x20 <4>[ 125.501803] ? __might_fault+0xea/0x1a0 <4>[ 125.501902] ? i915_gem_execbuffer2_ioctl+0x108/0x770 [i915] <4>[ 125.502012] ? i915_gem_execbuffer_ioctl+0xb90/0xb90 [i915] <4>[ 125.502116] ? i915_gem_execbuffer_ioctl+0xb90/0xb90 [i915] <4>[ 125.502218] i915_gem_execbuffer2_ioctl+0x3c5/0x770 [i915] <4>[ 125.502243] ? drm_dev_enter+0xe0/0xe0 <4>[ 125.502260] ? lock_acquire+0x138/0x3c0 <4>[ 125.502362] ? i915_gem_execbuffer_ioctl+0xb90/0xb90 [i915] <4>[ 125.502470] ? i915_gem_object_create.part.28+0x570/0x570 [i915] <4>[ 125.502575] ? i915_gem_execbuffer_ioctl+0xb90/0xb90 [i915] <4>[ 125.502680] ? i915_gem_execbuffer_ioctl+0xb90/0xb90 [i915] <4>[ 125.502702] drm_ioctl_kernel+0x151/0x200 <4>[ 125.502721] ? drm_ioctl_permit+0x2a0/0x2a0 <4>[ 125.502746] drm_ioctl+0x63a/0x920 <4>[ 125.502844] ? i915_gem_execbuffer_ioctl+0xb90/0xb90 [i915] <4>[ 125.502868] ? drm_getstats+0x20/0x20 <4>[ 125.502886] ? trace_hardirqs_on_thunk+0x1a/0x1c <4>[ 125.502919] do_vfs_ioctl+0x173/0xe90 <4>[ 125.502936] ? trace_hardirqs_on_thunk+0x1a/0x1c <4>[ 125.502957] ? ioctl_preallocate+0x170/0x170 <4>[ 125.502978] ? trace_hardirqs_on_thunk+0x1a/0x1c <4>[ 125.503002] ? retint_kernel+0x2d/0x2d <4>[ 125.503024] ksys_ioctl+0x35/0x60 <4>[ 125.503043] __x64_sys_ioctl+0x6a/0xb0 <4>[ 125.503061] do_syscall_64+0x97/0x400 <4>[ 125.503081] entry_SYSCALL_64_after_hwframe+0x49/0xbe <4>[ 125.503101] RIP: 0033:0x7fe18e4f65d7 <4>[ 125.503116] RSP: 002b:00007ffe2ffc06a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 <4>[ 125.503145] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fe18e4f65d7 <4>[ 125.503168] RDX: 00007ffe2ffc07f0 RSI: 0000000040406469 RDI: 0000000000000003 <4>[ 125.503191] RBP: 00007ffe2ffc07f0 R08: 0000000000000004 R09: 00007ffe2ffcf080 <4>[ 125.503215] R10: 000000000002c7de R11: 0000000000000246 R12: 0000000040406469 <4>[ 125.503238] R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000 <4>[ 125.503268] Code: e8 18 a0 c9 da 48 8b 35 25 3a 47 00 49 c7 c0 a0 3b 88 c0 b9 95 05 00 00 48 c7 c2 e0 49 88 c0 48 c7 c7 8d 3b 5d c0 e8 ee 7e db da <0f> 0b 48 89 ef e8 a4 26 f5 da e9 51 fe ff ff e8 8a 26 f5 da e9 <1>[ 125.503548] RIP: i915_retire_requests+0x3f2/0x590 [i915] RSP: ffff88004e5dec40 Fixes: 643b450a594e ("drm/i915: Only track live rings for retiring") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180504101147.26286-1-chris@chris-wilson.co.uk
2018-05-04drm/i915: Keep one request in our ring_listChris Wilson1-3/+3
Don't pre-emptively retire the oldest request in our ring's list if it is the only request. We keep various bits of state alive using the active reference from the request and would rather transfer that state over to a new request rather than the more involved process of retiring and reacquiring it. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180503195115.22309-2-chris@chris-wilson.co.uk
2018-05-03drm/i915: Reset the hangcheck timestamp before repeating a seqnoChris Wilson1-0/+1
In the unusual circumstance where we reuse a seqno (for example, in igt), make sure that we reset the hangcheck timestamp before it sees the same seqno again. References: https://bugs.freedesktop.org/show_bug.cgi?id=106215 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180502220313.6459-1-chris@chris-wilson.co.uk
2018-05-03drm/i915: Split i915_gem_timeline into individual timelinesChris Wilson1-35/+33
We need to move to a more flexible timeline that doesn't assume one fence context per engine, and so allow for a single timeline to be used across a combination of engines. This means that preallocating a fence context per engine is now a hindrance, and so we want to introduce the singular timeline. From the code perspective, this has the notable advantage of clearing up a lot of mirky semantics and some clumsy pointer chasing. By splitting the timeline up into a single entity rather than an array of per-engine timelines, we can realise the goal of the previous patch of tracking the timeline alongside the ring. v2: Tweak wait_for_idle to stop the compiling thinking that ret may be uninitialised. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180502163839.3248-2-chris@chris-wilson.co.uk