summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/i915/gt/intel_reset.c
AgeCommit message (Collapse)AuthorFilesLines
2021-02-02drm/i915/gt: Remove references to struct drm_device.pdevThomas Zimmermann1-3/+3
Using struct drm_device.pdev is deprecated. Convert i915 to struct drm_device.dev. No functional changes. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210128133127.2311-3-tzimmermann@suse.de
2021-01-15Merge tag 'drm-intel-gt-next-2021-01-14' of ↵Dave Airlie1-34/+62
git://anongit.freedesktop.org/drm/drm-intel into drm-next UAPI Changes: - Deprecate I915_PMU_LAST and optimize state tracking (Tvrtko) Avoid relying on last item ABI marker in i915_drm.h, add a comment to mark as deprecated. Cross-subsystem Changes: Core Changes: Driver Changes: - Restore clear residuals security mitigations for Ivybridge and Baytrail (Chris) - Close #1858: Allow sysadmin to choose applied GPU security mitigations through i915.mitigations=... similar to CPU (Chris) - Fix for #2024: GPU hangs on HSW GT1 (Chris) - Fix for #2707: Driver hang when editing UVs in Blender (Chris, Ville) - Fix for #2797: False positive GuC loading error message (Chris) - Fix for #2859: Missing GuC firmware for older Cometlakes (Chris) - Lessen probability of GPU hang due to DMAR faults [reason 7, next page table ptr is invalid] on Tigerlake (Chris) - Fix REVID macros for TGL to fetch correct stepping (Aditya) - Limit frequency drop to RPe on parking (Chris, Edward) - Limit W/A 1406941453 to TGL, RKL and DG1 (Swathi) - Make W/A 22010271021 permanent on DG1 (Lucas) - Implement W/A 16011163337 to prevent a HS/DS hang on DG1 (Swathi) - Only disable preemption on gen8 render engines (Chris) - Disable arbitration around Braswell's PDP updates (Chris) - Disable arbitration on no-preempt requests (Chris) - Check for arbitration after writing start seqno before busywaiting (Chris) - Retain default context state across shrinking (Venkata, CQ) - Fix mismatch between misplaced vma check and vma insert for 32-bit addressing userspaces (Chris, CQ) - Propagate error for vmap() failure instead kernel NULL deref (Chris) - Propagate error from cancelled submit due to context closure immediately (Chris) - Fix RCU race on HWSP tracking per request (Chris) - Clear CMD parser shadow and GPU reloc batches (Matt A) - Populate logical context during first pin (Maarten) - Optimistically prune dma-resv from the shrinker (Chris) - Fix for virtual engine ownership race (Chris) - Remove timeslice suppression to restore fairness for virtual engines (Chris) - Rearrange IVB/HSW workarounds properly between GT and engine (Chris) - Taint the reset mutex with the shrinker (Chris) - Replace direct submit with direct call to tasklet (Chris) - Multiple corrections to virtual engine dequeue and breadcrumbs code (Chris) - Avoid wakeref from potentially hard IRQ context in PMU (Tvrtko) - Use raw clock for RC6 time estimation in PMU (Tvrtko) - Differentiate OOM failures from invalid map types (Chris) - Fix Gen9 to have 64 MOCS entries similar to Gen11 (Chris) - Ignore repeated attempts to suspend request flow across reset (Chris) - Remove livelock from "do_idle_maps" VT-d W/A (Chris) - Cancel the preemption timeout early in case engine reset fails (Chris) - Code flow optimization in the scheduling code (Chris) - Clear the execlists timers upon reset (Chris) - Drain the breadcrumbs just once (Chris, Matt A) - Track the overall GT awake/busy time (Chris) - Tweak submission tasklet flushing to avoid starvation (Chris) - Track timelines created using the HWSP to restore on resume (Chris) - Use cmpxchg64 for 32b compatilibity for active tracking (Chris) - Prefer recycling an idle GGTT fence to avoid GPU wait (Chris) - Restructure GT code organization for clearer split between GuC and execlists (Chris, Daniele, John, Matt A) - Remove GuC code that will remain unused by new interfaces (Matt B) - Restructure the CS timestamp clocks code to local to GT (Chris) - Fix error return paths in perf code (Zhang) - Replace idr_init() by idr_init_base() in perf (Deepak) - Fix shmem_pin_map error path (Colin) - Drop redundant free_work worker for GEM contexts (Chris, Mika) - Increase readability and understandability of intel_workarounds.c (Lucas) - Defer enabling the breadcrumb interrupt to after submission (Chris) - Deal with buddy alloc block sizes beyond 4G (Venkata, Chris) - Encode fence specific waitqueue behaviour into the wait.flags (Chris) - Don't cancel the breadcrumb interrupt shadow too early (Chris) - Cancel submitted requests upon context reset (Chris) - Use correct locks in GuC code (Tvrtko) - Prevent use of engine->wa_ctx after error (Chris, Matt R) - Fix build warning on 32-bit (Arnd) - Avoid memory leak if platform would have more than 16 W/A (Tvrtko) - Avoid unnecessary #if CONFIG_PM in PMU code (Chris, Tvrtko) - Improve debugging output (Chris, Tvrtko, Matt R) - Make file local variables static (Jani) - Avoid uint*_t types in i915 (Jani) - Selftest improvements (Chris, Matt A, Dan) - Documentation fixes (Chris, Jose) Signed-off-by: Dave Airlie <airlied@redhat.com> # Conflicts: # drivers/gpu/drm/i915/gt/intel_breadcrumbs.c # drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h # drivers/gpu/drm/i915/gt/intel_lrc.c # drivers/gpu/drm/i915/gvt/mmio_context.h # drivers/gpu/drm/i915/i915_drv.h From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210114152232.GA21588@jlahtine-mobl.ger.corp.intel.com
2021-01-05drm/i915/gt: Allow failed resets without assertionChris Wilson1-0/+3
If the engine reset fails, we will attempt to resume with the current inflight submissions. When that happens, we cannot assert that the engine reset cleared the pending submission, so do not. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2878 Fixes: 16f2941ad307 ("drm/i915/gt: Replace direct submit with direct call to tasklet") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210104115145.24460-3-chris@chris-wilson.co.uk
2020-12-30drm/i915/gt: Taint the reset mutex with the shrinkerChris Wilson1-0/+11
Declare that, under extreme circumstances, the shrinker may need to wait upon a request, in which case reset must not itself deadlock in order to ensure forward progress of the driver. That is since the shrinker may depend upon a reset, any reset cannot touch the shrinker. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201229141626.4773-1-chris@chris-wilson.co.uk
2020-12-24drm/i915/gt: Replace direct submit with direct call to taskletChris Wilson1-22/+38
Rather than having special case code for opportunistically calling process_csb() and performing a direct submit while holding the engine spinlock for submitting the request, simply call the tasklet directly. This allows us to retain the direct submission path, including the CS draining to allow fast/immediate submissions, without requiring any duplicated code paths, and most importantly greatly simplifying the control flow by removing reentrancy. This will enable us to close a few races in the virtual engines in the next few patches. The trickiest part here is to ensure that paired operations (such as schedule_in/schedule_out) remain under consistent locking domains, e.g. when pulled outside of the engine->active.lock v2: Use bh kicking, see commit 3c53776e29f8 ("Mark HI and TASKLET softirq synchronous"). v3: Update engine-reset to be tasklet aware Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201224135544.1713-1-chris@chris-wilson.co.uk
2020-12-04drm/i915/gt: Include reset failures in the traceChris Wilson1-12/+10
The GT and engine reset failures are completely invisible when looking at a trace for a bug, but are vital to understanding the incomplete flow. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201204151234.19729-3-chris@chris-wilson.co.uk
2020-12-03Merge tag 'drm-intel-next-queued-2020-11-27' of ↵Dave Airlie1-2/+2
git://anongit.freedesktop.org/drm/drm-intel into drm-next drm/i915 features for v5.11: Highlights: - Enable big joiner to join two pipes to one port to overcome pipe restrictions (Manasi, Ville, Maarten) Display: - More DG1 enabling (Lucas, Aditya) - Fixes to cases without display (Lucas, José, Jani) - Initial PSR state improvements (José) - JSL eDP vswing updates (Tejas) - Handle EDID declared max 16 bpc (Ville) - Display refactoring (Ville) Other: - GVT features - Backmerge Signed-off-by: Dave Airlie <airlied@redhat.com> From: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/87czzzkk1s.fsf@intel.com
2020-11-11drm/i915/display: add namespace to intel_finish_resetLucas De Marchi1-1/+1
Rename intel_finish_reset to intel_display_finish_reset, so it's clear from gt/ that we are calling out the display code. Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201106225531.920641-2-lucas.demarchi@intel.com
2020-11-11drm/i915/display: add namespace to intel_prepare_resetLucas De Marchi1-1/+1
Rename intel_prepare_reset to intel_display_prepare_reset, so it's clear from gt/ that we are calling out the display code. Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201106225531.920641-1-lucas.demarchi@intel.com
2020-11-09drm/i915: Improve record of hung engines in error stateTvrtko Ursulin1-1/+1
Between events which trigger engine and GPU resets and capturing the error state we lose information on which engine triggered the reset. Improve this by passing in the hung engine mask down to error capture. Result is that the list of engines in user visible "GPU HANG: ecode <gen>:<engines>:<ecode>, <process>" is now a list of hanging and not just active engines. Most importantly the displayed process is now the one which was actually hung. v2: * Stub prototype. (Chris) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20201104134743.916027-1-tvrtko.ursulin@linux.intel.com
2020-09-30drm/i915/gt: Retire cancelled requests on unloadChris Wilson1-0/+2
If we manage to hit the intel_gt_set_wedged_on_fini() while active, i.e. module unload during a stress test, we may cancel the requests but not clean up. This leads to a very slow module unload as we wait for something or other to trigger the retirement flushing, or timeout and unload with a bunch of warnings. Instead if we explicitly cancel then cleanup on an active unload, it should be instant and quiet. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200930163253.2789-3-chris@chris-wilson.co.uk
2020-09-07drm/i915/gt: Distinguish the virtual breadcrumbs from the irq breadcrumbsChris Wilson1-0/+1
On the virtual engines, we only use the intel_breadcrumbs for tracking signaling of stale breadcrumbs from the irq_workers. They do not have any associated interrupt handling, active requests are passed to a physical engine and associated breadcrumb interrupt handler. This causes issues for us as we need to ensure that we do not actually try and enable interrupts and the powermanagement required for them on the virtual engine, as they will never be disabled. Instead, let's specify the physical engine used for interrupt handler on a particular breadcrumb. v2: Drop b->irq_armed = true mocking for no interrupt HW Fixes: 4fe6abb8f513 ("drm/i915/gt: Ignore irq enabling on the virtual engines") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200731154834.8378-4-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-07-08drm/i915: Move the engine mask to intel_gt_infoDaniele Ceraolo Spurio1-3/+3
Since the engines belong to the GT, move the runtime-updated list of available engines to the intel_gt struct. The original mask has been renamed to indicate it contains the maximum engine list that can be found on a matching device. In preparation for other info being moved to the gt in follow up patches (sseu), introduce an intel_gt_info structure to group all gt-related runtime info. v2: s/max_engine_mask/platform_engine_mask (tvrtko), fix selftest Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Andi Shyti <andi.shyti@intel.com> Cc: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> #v1 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200708003952.21831-5-daniele.ceraolospurio@intel.com
2020-07-06drm/i915: Print caller when tainting for CIMichał Winiarski1-3/+3
We can add taint from multiple places, printing the caller allows us to have a better overview of what exactly caused us to do the tainting. v2: Tweak format and print the device (Chris) v3: Move things around (Chris) Suggested-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Petri Latvala <petri.latvala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200706144107.204821-2-michal@hardline.pl
2020-07-06drm/i915: Reboot CI if we get wedged during driver initMichał Winiarski1-2/+11
Getting wedged device on driver init is pretty much unrecoverable. Since we're running various scenarios that may potentially hit this in CI (module reload / selftests / hotunplug), and if it happens, it means that we can't trust any subsequent CI results, we should just apply the taint to let the CI know that it should reboot (CI checks taint between test runs). v2: Comment that WEDGED_ON_INIT is non-recoverable, distinguish WEDGED_ON_INIT from WEDGED_ON_FINI (Chris) v3: Appease checkpatch, fixup search-replace logic expression mindbomb in assert (Chris) Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Petri Latvala <petri.latvala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200706144107.204821-1-michal@hardline.pl
2020-06-22drm/i915/params: switch to device specific parametersJani Nikula1-3/+3
Start using device specific parameters instead of module parameters for most things. The module parameters become the immutable initial values for i915 parameters. The device specific parameters in i915->params start life as a copy of i915_modparams. Any later changes are only reflected in the debugfs. The stragglers are: * i915.force_probe and i915.modeset. Needed before dev_priv is available. This is fine because the parameters are read-only and never modified. * i915.verbose_state_checks. Passing dev_priv to I915_STATE_WARN and I915_STATE_WARN_ON would result in massive and ugly churn. This is handled by not exposing the parameter via debugfs, and leaving the parameter writable in sysfs. This may be fixed up in follow-up work. * i915.inject_probe_failure. Only makes sense in terms of the module, not the device. This is handled by not exposing the parameter via debugfs. v2: Fix uc i915 lookup code (Michał Winiarski) Cc: Juha-Pekka Heikkilä <juha-pekka.heikkila@intel.com> Cc: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Acked-by: Michał Winiarski <michal.winiarski@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20200618150402.14022-1-jani.nikula@intel.com
2020-04-08drm/i915/gt: prefer struct drm_device based loggingJani Nikula1-7/+7
Prefer struct drm_device based logging over struct device based logging. No functional changes. Cc: Wambui Karuga <wambui.karugax@gmail.com> Reviewed-by: Wambui Karuga <wambui.karugax@gmail.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200402114819.17232-16-jani.nikula@intel.com
2020-03-20drm/i915/gt: Cancel a hung context if already closedChris Wilson1-0/+5
Use the restored ability to check if a context is closed to decide whether or not to immediately ban the context from further execution after a hang. Fixes: be90e344836a ("drm/i915/gt: Cancel banned contexts after GT reset") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200319170707.8262-2-chris@chris-wilson.co.uk
2020-03-16drm/i915: Move GGTT fence registers under gt/Chris Wilson1-1/+1
Since the fence registers control HW detiling through the GGTT aperture, make them a part of the intel_ggtt under gt/ Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200316113846.4974-1-chris@chris-wilson.co.uk
2020-03-04drm/i915/gt: Cancel banned contexts after GT resetChris Wilson1-7/+1
As we started marking the ce->gem_context as NULL on closure, we can no longer use that to carry closure information. Instead, we can look at whether the context was killed on closure instead. Fixes: 130a95e9098e ("drm/i915/gem: Consolidate ctx->engines[] release") Closes: https://gitlab.freedesktop.org/drm/intel/issues/1379 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200304165113.2449213-1-chris@chris-wilson.co.uk
2020-03-04drm/i915: Apply i915_request_skip() on submissionChris Wilson1-5/+8
Trying to use i915_request_skip() prior to i915_request_add() causes us to try and fill the ring upto request->postfix, which has not yet been set, and so may cause us to memset() past the end of the ring. Instead of skipping the request immediately, just flag the error on the request (only accepting the first fatal error we see) and then clear the request upon submission. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200304121849.2448028-1-chris@chris-wilson.co.uk
2020-02-07drm/i915/gt: Only ignore already reset requestsChris Wilson1-1/+1
If a request is being re-run after an innocent reset, it is marked as -EAGAIN. So only skip an engine reset if the request is marked as -EIO. Testcase: igt/gem_ctx_exec/basic-nohangcheck Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200207161602.2838218-1-chris@chris-wilson.co.uk
2020-02-01drm/i915: extract engine WA programming to common resume functionDaniele Ceraolo Spurio1-2/+2
The workarounds are a common "feature" across gens and submission mechanisms and we already call the other WA related functions from common engine ones (<setup/cleanup>_common), so it makes sense to do the same with WA application. Medium-term, This will help us reduce the duplication once the GuC resume function is added, but short term it will also allow us to use the workaround lists for pre-gen8 engine workarounds. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200131075716.2212299-2-chris@chris-wilson.co.uk
2020-01-30drm/i915/reset: conversion to new drm logging macros in gt/intel_reset.cWambui Karuga1-22/+26
This converts most instances of the printk based drm logging macros in i915/gt/intel_resect.c to the new struct drm_based logging macros. Signed-off-by: Wambui Karuga <wambui.karugax@gmail.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200128071437.9284-4-wambui.karugax@gmail.com
2020-01-28drm/i915/gt: Lift set-wedged engine dumping out of user pathsChris Wilson1-9/+22
The user (e.g. gem_eio) can manipulate the driver into wedging itself, allowing the user to trigger voluminous logging of inconsequential details. If we lift the dump to direct calls to intel_gt_set_wedged(), out of the intel_reset failure handling, we keep the detail logging for what we expect are true HW or test failures without being tricked. Reported-by: Tomi Sarvela <tomi.p.sarvela@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tomi Sarvela <tomi.p.sarvela@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200127231540.3302516-6-chris@chris-wilson.co.uk
2020-01-10drm/i915: Start chopping up the GPU error captureChris Wilson1-1/+1
In the near future, we will want to start a GPU error capture from a new context, from inside the softirq region of a forced preemption. To do so requires us to break up the monolithic error capture to provide new entry points with finer control; in particular focusing on one engine/gt, and being able to compose an error state from little pieces of HW capture. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Andi Shyti <andi.shyti@intel.com> Acked-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200110123059.1348712-1-chris@chris-wilson.co.uk
2020-01-06drm/i915/gt: Convert the final GEM_TRACE to GT_TRACE and coChris Wilson1-13/+8
Convert the few remaining GEM_TRACE() used for debugging over to the appropriate GT_TRACE or RQ_TRACE. References: 639f2f24895f ("drm/i915: Introduce new macros for tracing") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200106114234.2529613-4-chris@chris-wilson.co.uk
2019-12-30drm/i915/gt: Avoid using the GPU before initialisationChris Wilson1-0/+3
Mark the GT as wedged so that we are not tempted to use it prior to initialisation. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Andi Shyti <andi.shyti@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191229183153.3719869-3-chris@chris-wilson.co.uk
2019-12-29drm/i915: prefer 3-letter acronym for ironlakeLucas De Marchi1-4/+3
We are currently using a mix of platform name and acronym to name the functions. Let's prefer the acronym as it should be clear what platform it's about and it's shorter, so it doesn't go over 80 columns in a few cases. This converts ironlake to ilk where appropriate. Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Acked-by: Jani Nikula <jani.nikula@linux.intel.com> Acked-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191224084012.24241-7-lucas.demarchi@intel.com
2019-12-26drm/i915/gt: Ignore incomplete engines after init failureChris Wilson1-2/+2
Do not expose incomplete engines to the user after we fail to setup the GT. Fixes: e6ba76480299 ("drm/i915: Remove i915->kernel_context") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Andi Shyti <andi.shyti@intel.com> Acked-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191226191237.2654510-1-chris@chris-wilson.co.uk
2019-12-23drm/i915: Mark the GEM context link as RCU protectedChris Wilson1-6/+20
The only protection for intel_context.gem_cotext is granted by RCU, so annotate it as a rcu protected pointer and carefully dereference it in the few occasions we need to use it. Fixes: 9f3ccd40acf4 ("drm/i915: Drop GEM context as a direct link from i915_request") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Andi Shyti <andi.shyti@intel.com> Acked-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191222233558.2201901-1-chris@chris-wilson.co.uk
2019-12-22drm/i915/gt: Pull GT initialisation under intel_gt_init()Chris Wilson1-3/+6
Begin pulling the GT setup underneath a single GT umbrella; let intel_gt take ownership of its engines! As hinted, the complication is the lifetime of the probed engine versus the active lifetime of the GT backends. We need to detect the engine layout early and keep it until the end so that we can sanitize state on takeover and release. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Andi Shyti <andi.shyti@intel.com> Acked-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191222120752.1368352-1-chris@chris-wilson.co.uk
2019-12-21drm/i915: Remove i915->kernel_contextChris Wilson1-5/+9
Allocate only an internal intel_context for the kernel_context, forgoing a global GEM context for internal use as we only require a separate address space (for our own protection). Now having weaned GT from requiring ce->gem_context, we can stop referencing it entirely. This also means we no longer have to create random and unnecessary GEM contexts for internal use. GEM contexts are now entirely for tracking GEM clients, and intel_context the execution environment on the GPU. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Andi Shyti <andi.shyti@intel.com> Acked-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191221160324.1073045-1-chris@chris-wilson.co.uk
2019-12-20drm/i915: Drop GEM context as a direct link from i915_requestChris Wilson1-17/+20
Keep the intel_context as being the primary state for i915_request, with the GEM context a backpointer from the low level state for the rarer cases we need client information. Our goal is to remove such references to clients from the backend, and leave the HW submission agnostic to client interfaces and self-contained. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Andi Shyti <andi.shyti@intel.com> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191220101230.256839-1-chris@chris-wilson.co.uk
2019-12-18drm/i915/gt: Remove direct invocation of breadcrumb signalingChris Wilson1-2/+2
Only signal the breadcrumbs from inside the irq_work, simplifying our interface and calling conventions. The micro-optimisation here is that by always using the irq_work interface, we know we are always inside an irq-off critical section for the breadcrumb signaling and can ellide save/restore of the irq flags. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191217095642.3124521-7-chris@chris-wilson.co.uk
2019-12-13drm/i915: Introduce new macros for tracingVenkata Sandeep Dhanalakota1-1/+1
New macros ENGINE_TRACE(), CE_TRACE(), RQ_TRACE() and GT_TRACE() are introduce to tag device name and engine name with contexts and requests tracing in i915. Cc: Sudeep Dutt <sudeep.dutt@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191213155152.69182-2-venkata.s.dhanalakota@intel.com
2019-12-10drm/i915/guc: kill the GuC clientDaniele Ceraolo Spurio1-2/+4
We now only use 1 client without any plan to add more. The client is also only holding information about the WQ and the process desc, so we can just move those in the intel_guc structure and always use stage_id 0. v2: fix comment (John) v3: fix the comment for real, fix kerneldoc Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191205220243.27403-4-daniele.ceraolospurio@intel.com
2019-12-04drm/i915: Introduce DRM_I915_GEM_MMAP_OFFSETAbdiel Janulgue1-1/+6
This is really just an alias of mmap_gtt. The 'mmap offset' nomenclature comes from the value returned by this ioctl which is the offset into the device fd which userpace uses with mmap(2). mmap_gtt was our initial mmap_offset implementation, this extends our CPU mmap support to allow additional fault handlers that depends on the object's backing pages. Note that we multiplex mmap_gtt and mmap_offset through the same ioctl, and use the zero extending behaviour of drm to differentiate between them, when we inspect the flags. To support multiple mmap types on an object we need to support multiple mmap_offsets for an object (each offset in the global device address space corresponding to a unique instance of the object for a file + mmap type). As we drop the simplified drm core idea of a single mmap_offset, we need to provide replacement hooks for the dumb mmap interface as well. Link: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1675 Testcase: igt/gem_mmap_offset Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191204120032.3682839-1-chris@chris-wilson.co.uk
2019-11-20drm/i915/gt: Declare timeline.lock to be irq-freeChris Wilson1-5/+4
Now that we never allow the intel_wakeref callbacks to be invoked from interrupt context, we do not need the irqsafe spinlock for the timeline. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191120170858.3965380-1-chris@chris-wilson.co.uk
2019-11-20drm/i915: Mark up the calling context for intel_wakeref_put()Chris Wilson1-1/+1
Previously, we assumed we could use mutex_trylock() within an atomic context, falling back to a worker if contended. However, such trickery is illegal inside interrupt context, and so we need to always use a worker under such circumstances. As we normally are in process context, we can typically use a plain mutex, and only defer to a work when we know we are being called from an interrupt path. Fixes: 51fbd8de87dc ("drm/i915/pmu: Atomically acquire the gt_pm wakeref") References: a0855d24fc22d ("locking/mutex: Complain upon mutex API misuse in IRQ contexts") References: https://bugs.freedesktop.org/show_bug.cgi?id=111626 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191120125433.3767149-1-chris@chris-wilson.co.uk
2019-11-11drm/i915: Cancel context if it hangs after it is closedChris Wilson1-0/+7
If we detect a hang in a closed context, just flush all of its requests and cancel any remaining execution along the context. Note that after closing the context, the last reference to the context may be dropped, leaving it only valid under RCU. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191111114323.5833-5-chris@chris-wilson.co.uk
2019-11-11drm/i915: Show guilty context name on GPU resetChris Wilson1-0/+4
We mention that we are resetting the GPU, and dump the device state for post mortem debugging. However, while that dump contains the active processes and the one flagged as causing the error, we do not always include that information in dmesg. Include the name of the guilty process in dmesg for reference. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191111114323.5833-4-chris@chris-wilson.co.uk
2019-10-24drm/i915/gt: Replace hangcheck by heartbeatsChris Wilson1-2/+1
Replace sampling the engine state every so often with a periodic heartbeat request to measure the health of an engine. This is coupled with the forced-preemption to allow long running requests to survive so long as they do not block other users. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Jon Bloomfield <jon.bloomfield@intel.com> Reviewed-by: Jon Bloomfield <jon.bloomfield@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191023133108.21401-5-chris@chris-wilson.co.uk
2019-10-18drm/i915: Pass in intel_gt at some for_each_engine sitesTvrtko Ursulin1-9/+9
Where the function, or code segment, operates on intel_gt, we need to start passing it instead of i915 to for_each_engine(_masked). This is another partial step in migration of i915->engines[] to gt->engines[]. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191017094500.21831-2-tvrtko.ursulin@linux.intel.com
2019-10-18drm/i915: Make for_each_engine_masked work on intel_gtTvrtko Ursulin1-6/+6
Medium term goal is to eliminate the i915->engine[] array and to get there we have recently introduced equivalent array in intel_gt. Now we need to migrate the code further towards this state. This next step is to eliminate usage of i915->engines[] from the for_each_engine_masked iterator. For this to work we also need to use engine->id as index when populating the gt->engine[] array and adjust the default engine set indexing to use engine->legacy_idx instead of assuming gt->engines[] indexing. v2: * Populate gt->engine[] earlier. * Check that we don't duplicate engine->legacy_idx v3: * Work around the initialization order issue between default_engines() and intel_engines_driver_register() which sets engine->legacy_idx for now. It will be fixed properly later. v4: * Merge with forgotten v2.5. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191017161852.8836-1-tvrtko.ursulin@linux.intel.com
2019-10-16drm/i915: Store i915_ggtt as the backpointer on fence registersChris Wilson1-1/+1
Now that i915_ggtt knows everything about its own paths to perform mmio, we can use that as our primary backpointer for individual fence registers. This reduces the amount of pointer dancing we have to perform on the common paths, but more importantly finishes our fence register encapsulation. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191016143234.4075-1-chris@chris-wilson.co.uk
2019-10-10drm/i915/gt: Warn CI about an unrecoverable wedgeChris Wilson1-1/+7
If we have a wedged GPU that we need to recover, but fail, add a taint for CI to pickup and schedule a reboot. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Petri Latvala <petri.latvala@intel.com> Reviewed-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191002160034.5121-1-chris@chris-wilson.co.uk
2019-10-07drm/i915/gt: Prefer local path to runtime powermanagementChris Wilson1-3/+3
Avoid going to the base i915 device when we already have a path from gt to the runtime powermanagement interface. The benefit is that it looks a bit more self-consistent to always be acquiring the gt->uncore->rpm for use with the gt->uncore. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191007154531.1750-1-chris@chris-wilson.co.uk
2019-10-07drm/i915: make array hw_engine_mask static, makes object smallerColin Ian King1-3/+3
Don't populate the array hw_engine_mask on the stack but instead make it static. Makes the object code smaller by 316 bytes. Before: text data bss dec hex filename 34004 4388 320 38712 9738 gpu/drm/i915/gt/intel_reset.o After: text data bss dec hex filename 33528 4548 320 38396 95fc gpu/drm/i915/gt/intel_reset.o (gcc version 9.2.1, amd64) Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191007154151.23245-1-colin.king@canonical.com
2019-10-04drm/i915/overlay: Drop struct_mutex guardChris Wilson1-4/+0
The overlay uses the modeset mutex to control itself and only required the struct_mutex for requests, which is now obsolete. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-16-chris@chris-wilson.co.uk