diff options
author | Lyude Paul <lyude@redhat.com> | 2018-11-07 00:30:16 +0300 |
---|---|---|
committer | Lyude Paul <lyude@redhat.com> | 2018-11-07 23:12:30 +0300 |
commit | 9a64c65083b910b3557b317dc56e1e93063ac350 (patch) | |
tree | e1291b8951830a6bfd7115422c2e4b3ea021934b /drivers/gpu/drm/i915/intel_hotplug.c | |
parent | 0759af9e75ca154602e28ef135bf980d1f2f4f30 (diff) | |
download | linux-9a64c65083b910b3557b317dc56e1e93063ac350.tar.xz |
drm/i915: Add short HPD IRQ storm detection for non-MST systems
Unfortunately, it seems that the HPD IRQ storm problem from the early
days of Intel GPUs was never entirely solved, only mostly. Within the
last couple of days, I got a bug report from one of our customers who
had been having issues with their machine suddenly booting up very
slowly after having updated. The amount of time it took to boot went
from around 30 seconds, to over 6 minutes consistently.
After some investigation, I discovered that i915 was reporting massive
amounts of short HPD IRQ spam on this system from the DisplayPort port,
despite there not being anything actually connected. The symptoms would
start with one "long" HPD IRQ being detected at boot:
[ 1.891398] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00440000, dig 0x00440000, pins 0x000000a0
[ 1.891436] [drm:intel_hpd_irq_handler [i915]] digital hpd port B - long
[ 1.891472] [drm:intel_hpd_irq_handler [i915]] Received HPD interrupt on PIN 5 - cnt: 0
[ 1.891508] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - long
[ 1.891544] [drm:intel_hpd_irq_handler [i915]] Received HPD interrupt on PIN 7 - cnt: 0
[ 1.891592] [drm:intel_dp_hpd_pulse [i915]] got hpd irq on port B - long
[ 1.891628] [drm:intel_dp_hpd_pulse [i915]] got hpd irq on port D - long
…
followed by constant short IRQs afterwards:
[ 1.895091] [drm:intel_encoder_hotplug [i915]] [CONNECTOR:66:DP-1] status updated from unknown to disconnected
[ 1.895129] [drm:i915_hotplug_work_func [i915]] Connector DP-3 (pin 7) received hotplug event.
[ 1.895165] [drm:intel_dp_detect [i915]] [CONNECTOR:72:DP-3]
[ 1.895275] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00200000, dig 0x00200000, pins 0x00000080
[ 1.895312] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - short
[ 1.895762] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00200000, dig 0x00200000, pins 0x00000080
[ 1.895799] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - short
[ 1.896239] [drm:intel_dp_aux_xfer [i915]] dp_aux_ch timeout status 0x71450085
[ 1.896293] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00200000, dig 0x00200000, pins 0x00000080
[ 1.896330] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - short
[ 1.896781] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00200000, dig 0x00200000, pins 0x00000080
[ 1.896817] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - short
[ 1.897275] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00200000, dig 0x00200000, pins 0x00000080
The customer's system in question has a GM45 GPU, which is apparently
well known for hotplugging storms.
So, workaround this impressively broken hardware by changing the default
HPD storm threshold from 5 to 50. Then, make long IRQs count for 10, and
short IRQs count for 1. This makes it so that 5 long IRQs will trigger
an HPD storm, and on systems with short HPD storm detection 50 short
IRQs will trigger an HPD storm. 50 short IRQs amounts to 100ms of
constant pulsing, which seems like a good middleground between being too
sensitive and not being sensitive enough (which would cause visible
stutters in userspace every time a storm occurs).
And just to be extra safe: we don't enable this by default on systems
with MST support. There's too high of a chance of MST support triggering
storm detection, and systems that are new enough to support MST are a
lot less likely to have issues with IRQ storms anyway.
As a note: this patch was tested using a ThinkPad T450s and a Chamelium
to simulate the short IRQ storms.
Changes since v1:
- Don't use two separate thresholds, just make long IRQs count for 10
each and short IRQs count for 1. This simplifies the code a bit
- Ville Syrjälä
Changes since v2:
- Document @long_hpd in intel_hpd_irq_storm_detect, no functional
changes
Changes since v4:
- Remove !! in long_hpd assignment - Ville Syrjälä
- queue_hp = true - Ville Syrjälä
Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181106213017.14563-6-lyude@redhat.com
Diffstat (limited to 'drivers/gpu/drm/i915/intel_hotplug.c')
-rw-r--r-- | drivers/gpu/drm/i915/intel_hotplug.c | 50 |
1 files changed, 30 insertions, 20 deletions
diff --git a/drivers/gpu/drm/i915/intel_hotplug.c b/drivers/gpu/drm/i915/intel_hotplug.c index d642c0795452..42e61e10f517 100644 --- a/drivers/gpu/drm/i915/intel_hotplug.c +++ b/drivers/gpu/drm/i915/intel_hotplug.c @@ -114,34 +114,46 @@ enum hpd_pin intel_hpd_pin_default(struct drm_i915_private *dev_priv, #define HPD_STORM_REENABLE_DELAY (2 * 60 * 1000) /** - * intel_hpd_irq_storm_detect - gather stats and detect HPD irq storm on a pin + * intel_hpd_irq_storm_detect - gather stats and detect HPD IRQ storm on a pin * @dev_priv: private driver data pointer * @pin: the pin to gather stats on + * @long_hpd: whether the HPD IRQ was long or short * - * Gather stats about HPD irqs from the specified @pin, and detect irq + * Gather stats about HPD IRQs from the specified @pin, and detect IRQ * storms. Only the pin specific stats and state are changed, the caller is * responsible for further action. * - * The number of irqs that are allowed within @HPD_STORM_DETECT_PERIOD is + * The number of IRQs that are allowed within @HPD_STORM_DETECT_PERIOD is * stored in @dev_priv->hotplug.hpd_storm_threshold which defaults to - * @HPD_STORM_DEFAULT_THRESHOLD. If this threshold is exceeded, it's - * considered an irq storm and the irq state is set to @HPD_MARK_DISABLED. + * @HPD_STORM_DEFAULT_THRESHOLD. Long IRQs count as +10 to this threshold, and + * short IRQs count as +1. If this threshold is exceeded, it's considered an + * IRQ storm and the IRQ state is set to @HPD_MARK_DISABLED. + * + * By default, most systems will only count long IRQs towards + * &dev_priv->hotplug.hpd_storm_threshold. However, some older systems also + * suffer from short IRQ storms and must also track these. Because short IRQ + * storms are naturally caused by sideband interactions with DP MST devices, + * short IRQ detection is only enabled for systems without DP MST support. + * Systems which are new enough to support DP MST are far less likely to + * suffer from IRQ storms at all, so this is fine. * * The HPD threshold can be controlled through i915_hpd_storm_ctl in debugfs, * and should only be adjusted for automated hotplug testing. * - * Return true if an irq storm was detected on @pin. + * Return true if an IRQ storm was detected on @pin. */ static bool intel_hpd_irq_storm_detect(struct drm_i915_private *dev_priv, - enum hpd_pin pin) + enum hpd_pin pin, bool long_hpd) { struct i915_hotplug *hpd = &dev_priv->hotplug; unsigned long start = hpd->stats[pin].last_jiffies; unsigned long end = start + msecs_to_jiffies(HPD_STORM_DETECT_PERIOD); + const int increment = long_hpd ? 10 : 1; const int threshold = hpd->hpd_storm_threshold; bool storm = false; - if (!threshold) + if (!threshold || + (!long_hpd && !dev_priv->hotplug.hpd_short_storm_enabled)) return false; if (!time_in_range(jiffies, start, end)) { @@ -149,7 +161,8 @@ static bool intel_hpd_irq_storm_detect(struct drm_i915_private *dev_priv, hpd->stats[pin].count = 0; } - if (++hpd->stats[pin].count > threshold) { + hpd->stats[pin].count += increment; + if (hpd->stats[pin].count > threshold) { hpd->stats[pin].state = HPD_MARK_DISABLED; DRM_DEBUG_KMS("HPD interrupt storm detected on PIN %d\n", pin); storm = true; @@ -409,28 +422,24 @@ void intel_hpd_irq_handler(struct drm_i915_private *dev_priv, for_each_intel_encoder(&dev_priv->drm, encoder) { enum hpd_pin pin = encoder->hpd_pin; bool has_hpd_pulse = intel_encoder_has_hpd_pulse(encoder); + bool long_hpd = true; if (!(BIT(pin) & pin_mask)) continue; if (has_hpd_pulse) { - bool long_hpd = long_mask & BIT(pin); enum port port = encoder->port; + long_hpd = long_mask & BIT(pin); + DRM_DEBUG_DRIVER("digital hpd port %c - %s\n", port_name(port), long_hpd ? "long" : "short"); - /* - * For long HPD pulses we want to have the digital queue happen, - * but we still want HPD storm detection to function. - */ queue_dig = true; - if (long_hpd) { + if (long_hpd) dev_priv->hotplug.long_port_mask |= (1 << port); - } else { - /* for short HPD just trigger the digital queue */ + else dev_priv->hotplug.short_port_mask |= (1 << port); - continue; - } + } if (dev_priv->hotplug.stats[pin].state == HPD_DISABLED) { @@ -453,9 +462,10 @@ void intel_hpd_irq_handler(struct drm_i915_private *dev_priv, queue_hp = true; } - if (intel_hpd_irq_storm_detect(dev_priv, pin)) { + if (intel_hpd_irq_storm_detect(dev_priv, pin, long_hpd)) { dev_priv->hotplug.event_bits &= ~BIT(pin); storm_detected = true; + queue_hp = true; } } |