drm/i915: Implement inter-engine read-read optimisations

Currently, we only track the last request globally across all engines. This prevents us from issuing concurrent read requests on e.g. the RCS and BCS engines (or more likely the render and media engines). Without semaphores, we incur costly stalls as we synchronise between rings - greatly impacting the current performance of Broadwell versus Haswell in certain workloads (like video decode). With the introduction of reference counted requests, it is much easier to track the last request per ring, as well as the last global write request so that we can optimise inter-engine read read requests (as well as better optimise certain CPU waits). v2: Fix inverted readonly condition for nonblocking waits. v3: Handle non-continguous engine array after waits v4: Rebase, tidy, rewrite ring list debugging v5: Use obj->active as a bitfield, it looks cool v6: Micro-optimise, mostly involving moving code around v7: Fix retire-requests-upto for execlists (and multiple rq->ringbuf) v8: Rebase v9: Refactor i915_gem_object_sync() to allow the compiler to better optimise it. Benchmark: igt/gem_read_read_speed hsw:gt3e (with semaphores): Before: Time to read-read 1024k: 275.794µs After: Time to read-read 1024k: 123.260µs hsw:gt3e (w/o semaphores): Before: Time to read-read 1024k: 230.433µs After: Time to read-read 1024k: 124.593µs bdw-u (w/o semaphores): Before After Time to read-read 1x1: 26.274µs 10.350µs Time to read-read 128x128: 40.097µs 21.366µs Time to read-read 256x256: 77.087µs 42.608µs Time to read-read 512x512: 281.999µs 181.155µs Time to read-read 1024x1024: 1196.141µs 1118.223µs Time to read-read 2048x2048: 5639.072µs 5225.837µs Time to read-read 4096x4096: 22401.662µs 21137.067µs Time to read-read 8192x8192: 89617.735µs 85637.681µs Testcase: igt/gem_concurrent_blit (read-read and friends) Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> [v8] [danvet: s/\<rq\>/req/g] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
author: Chris Wilson <chris@chris-wilson.co.uk> 2015-04-27 15:41:17 +0300
committer: Daniel Vetter <daniel.vetter@ffwll.ch> 2015-05-21 16:11:42 +0300
commit: b47161858ba13c9c7e03333132230d66e008dd55 (patch)
tree: 164a2af09b14295bbe0472b451d9092b6d5e5af9 /drivers/gpu/drm/i915/intel_overlay.c
parent: eed29a5b21557476bbd8b141946a8dfe5aacc4f3 (diff)
download: linux-b47161858ba13c9c7e03333132230d66e008dd55.tar.xz
1 files changed, 0 insertions, 2 deletions
diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c
index 5fd2d5ac02e2..25c8ec697da1 100644
--- a/drivers/gpu/drm/i915/intel_overlay.c
+++ b/drivers/gpu/drm/i915/intel_overlay.c
@@ -228,7 +228,6 @@ static int intel_overlay_do_wait_request(struct intel_overlay *overlay,
 	ret = i915_wait_request(overlay->last_flip_req);
 	if (ret)
 		return ret;
-	i915_gem_retire_requests(dev);
 
 	i915_gem_request_assign(&overlay->last_flip_req, NULL);
 	return 0;
@@ -376,7 +375,6 @@ static int intel_overlay_recover_from_interrupt(struct intel_overlay *overlay)
 	ret = i915_wait_request(overlay->last_flip_req);
 	if (ret)
 		return ret;
-	i915_gem_retire_requests(overlay->dev);
 
 	if (overlay->flip_tail)
 		overlay->flip_tail(overlay);
author	Chris Wilson <chris@chris-wilson.co.uk>	2015-04-27 15:41:17 +0300
committer	Daniel Vetter <daniel.vetter@ffwll.ch>	2015-05-21 16:11:42 +0300
commit	b47161858ba13c9c7e03333132230d66e008dd55 (patch)
tree	164a2af09b14295bbe0472b451d9092b6d5e5af9 /drivers/gpu/drm/i915/intel_overlay.c
parent	eed29a5b21557476bbd8b141946a8dfe5aacc4f3 (diff)
download	linux-b47161858ba13c9c7e03333132230d66e008dd55.tar.xz