diff options
author | Mika Kuoppala <mika.kuoppala@linux.intel.com> | 2016-11-18 16:09:04 +0300 |
---|---|---|
committer | Mika Kuoppala <mika.kuoppala@intel.com> | 2016-11-21 15:36:40 +0300 |
commit | 3fe3b030bd2d7a51c12aa6fe0e5178b9f1a726ec (patch) | |
tree | 225cdffc34ad23568ea900ecaa8045ed0273cff1 /drivers/gpu/drm/i915/intel_ringbuffer.h | |
parent | 6e16d028e441b0b2c141aaecb39f4838cd2964b5 (diff) | |
download | linux-3fe3b030bd2d7a51c12aa6fe0e5178b9f1a726ec.tar.xz |
drm/i915: Decouple hang detection from hangcheck period
Hangcheck state accumulation has gained more steps
along the years, like head movement and more recently the
subunit inactivity check. As the subunit sampling is only
done if the previous state check showed inactivity, we
have added more stages (and time) to reach a hang verdict.
Asymmetric engine states led to different actual weight of
'one hangcheck unit' and it was demonstrated in some
hangs that due to difference in stages, simpler engines
were accused falsely of a hang as their scoring was much
more quicker to accumulate above the hang treshold.
To completely decouple the hangcheck guilty score
from the hangcheck period, convert hangcheck score to a
rough period of inactivity measurement. As these are
tracked as jiffies, they are meaningful also across
reset boundaries. This makes finding a guilty engine
more accurate across multi engine activity scenarios,
especially across asymmetric engines.
We lose the ability to detect cross batch malicious attempts
to hinder the progress. Plan is to move this functionality
to be part of context banning which is more natural fit,
later in the series.
v2: use time_before macros (Chris)
reinstate the pardoning of moving engine after hc (Chris)
v3: avoid global state for per engine stall detection (Chris)
v4: take timeline last retirement into account (Chris)
v5: do debug print on pardoning, split out retirement timestamp (Chris)
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Diffstat (limited to 'drivers/gpu/drm/i915/intel_ringbuffer.h')
-rw-r--r-- | drivers/gpu/drm/i915/intel_ringbuffer.h | 40 |
1 files changed, 31 insertions, 9 deletions
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h index 3152b2b4a202..3f43adefd1c0 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.h +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h @@ -65,16 +65,37 @@ struct intel_hw_status_page { GEN8_SEMAPHORE_OFFSET(from, (__ring)->id)) enum intel_engine_hangcheck_action { - HANGCHECK_IDLE = 0, - HANGCHECK_WAIT, - HANGCHECK_ACTIVE_SEQNO, - HANGCHECK_ACTIVE_HEAD, - HANGCHECK_ACTIVE_SUBUNITS, - HANGCHECK_KICK, - HANGCHECK_HUNG, + ENGINE_IDLE = 0, + ENGINE_WAIT, + ENGINE_ACTIVE_SEQNO, + ENGINE_ACTIVE_HEAD, + ENGINE_ACTIVE_SUBUNITS, + ENGINE_WAIT_KICK, + ENGINE_DEAD, }; -#define HANGCHECK_SCORE_RING_HUNG 31 +static inline const char * +hangcheck_action_to_str(const enum intel_engine_hangcheck_action a) +{ + switch (a) { + case ENGINE_IDLE: + return "idle"; + case ENGINE_WAIT: + return "wait"; + case ENGINE_ACTIVE_SEQNO: + return "active seqno"; + case ENGINE_ACTIVE_HEAD: + return "active head"; + case ENGINE_ACTIVE_SUBUNITS: + return "active subunits"; + case ENGINE_WAIT_KICK: + return "wait kick"; + case ENGINE_DEAD: + return "dead"; + } + + return "unknown"; +} #define I915_MAX_SLICES 3 #define I915_MAX_SUBSLICES 3 @@ -106,10 +127,11 @@ struct intel_instdone { struct intel_engine_hangcheck { u64 acthd; u32 seqno; - int score; enum intel_engine_hangcheck_action action; + unsigned long action_timestamp; int deadlock; struct intel_instdone instdone; + bool stalled; }; struct intel_ring { |