summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/i915/intel_overlay.c
diff options
context:
space:
mode:
authorChris Wilson <chris@chris-wilson.co.uk>2015-04-27 15:41:17 +0300
committerDaniel Vetter <daniel.vetter@ffwll.ch>2015-05-21 16:11:42 +0300
commitb47161858ba13c9c7e03333132230d66e008dd55 (patch)
tree164a2af09b14295bbe0472b451d9092b6d5e5af9 /drivers/gpu/drm/i915/intel_overlay.c
parenteed29a5b21557476bbd8b141946a8dfe5aacc4f3 (diff)
downloadlinux-b47161858ba13c9c7e03333132230d66e008dd55.tar.xz
drm/i915: Implement inter-engine read-read optimisations
Currently, we only track the last request globally across all engines. This prevents us from issuing concurrent read requests on e.g. the RCS and BCS engines (or more likely the render and media engines). Without semaphores, we incur costly stalls as we synchronise between rings - greatly impacting the current performance of Broadwell versus Haswell in certain workloads (like video decode). With the introduction of reference counted requests, it is much easier to track the last request per ring, as well as the last global write request so that we can optimise inter-engine read read requests (as well as better optimise certain CPU waits). v2: Fix inverted readonly condition for nonblocking waits. v3: Handle non-continguous engine array after waits v4: Rebase, tidy, rewrite ring list debugging v5: Use obj->active as a bitfield, it looks cool v6: Micro-optimise, mostly involving moving code around v7: Fix retire-requests-upto for execlists (and multiple rq->ringbuf) v8: Rebase v9: Refactor i915_gem_object_sync() to allow the compiler to better optimise it. Benchmark: igt/gem_read_read_speed hsw:gt3e (with semaphores): Before: Time to read-read 1024k: 275.794µs After: Time to read-read 1024k: 123.260µs hsw:gt3e (w/o semaphores): Before: Time to read-read 1024k: 230.433µs After: Time to read-read 1024k: 124.593µs bdw-u (w/o semaphores): Before After Time to read-read 1x1: 26.274µs 10.350µs Time to read-read 128x128: 40.097µs 21.366µs Time to read-read 256x256: 77.087µs 42.608µs Time to read-read 512x512: 281.999µs 181.155µs Time to read-read 1024x1024: 1196.141µs 1118.223µs Time to read-read 2048x2048: 5639.072µs 5225.837µs Time to read-read 4096x4096: 22401.662µs 21137.067µs Time to read-read 8192x8192: 89617.735µs 85637.681µs Testcase: igt/gem_concurrent_blit (read-read and friends) Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> [v8] [danvet: s/\<rq\>/req/g] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Diffstat (limited to 'drivers/gpu/drm/i915/intel_overlay.c')
-rw-r--r--drivers/gpu/drm/i915/intel_overlay.c2
1 files changed, 0 insertions, 2 deletions
diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c
index 5fd2d5ac02e2..25c8ec697da1 100644
--- a/drivers/gpu/drm/i915/intel_overlay.c
+++ b/drivers/gpu/drm/i915/intel_overlay.c
@@ -228,7 +228,6 @@ static int intel_overlay_do_wait_request(struct intel_overlay *overlay,
ret = i915_wait_request(overlay->last_flip_req);
if (ret)
return ret;
- i915_gem_retire_requests(dev);
i915_gem_request_assign(&overlay->last_flip_req, NULL);
return 0;
@@ -376,7 +375,6 @@ static int intel_overlay_recover_from_interrupt(struct intel_overlay *overlay)
ret = i915_wait_request(overlay->last_flip_req);
if (ret)
return ret;
- i915_gem_retire_requests(overlay->dev);
if (overlay->flip_tail)
overlay->flip_tail(overlay);