summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/nouveau
AgeCommit message (Collapse)AuthorFilesLines
2019-04-20drm/nouveau/volt/gf117: fix speedo readout registerIlia Mirkin4-1/+63
[ Upstream commit fc782242749fa4235592854fafe1a1297583c1fb ] GF117 appears to use the same register as GK104 (but still with the general Fermi readout mechanism). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108980 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-20drm/nouveau/debugfs: Fix check of pm_runtime_get_sync failureYueHaibing1-1/+1
[ Upstream commit 909e9c9c428376e2a43d178ed4b0a2d5ba9cb7d3 ] pm_runtime_get_sync returns negative on failure. Fixes: eaeb9010bb4b ("drm/nouveau/debugfs: Wake up GPU before doing any reclocking") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-05drm/nouveau: Stop using drm_crtc_force_disableDaniel Vetter1-1/+3
[ Upstream commit 934c5b32a5e43d8de2ab4f1566f91d7c3bf8cb64 ] The correct way for legacy drivers to update properties that need to do a full modeset, is to do a full modeset. Note that we don't need to call the drm_mode_config_internal helper because we're not changing any of the refcounted paramters. v2: Fixup error handling (Ville). Since the old code didn't bother I decided to just delete it instead of adding even more code for just error handling. Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> (v1) Cc: Sean Paul <seanpaul@chromium.org> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181217194303.14397-2-daniel.vetter@ffwll.ch Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-02-20drm/nouveau/falcon: avoid touching registers if engine is offIlia Mirkin1-2/+5
[ Upstream commit a5176a4cb85bb6213daadf691097cf411da35df2 ] Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108980 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-02-20drm/nouveau: Don't disable polling in fallback modeTakashi Iwai1-3/+4
[ Upstream commit 118780066e30c34de3d9349710b51780bfa0ba83 ] When a fan is controlled via linear fallback without cstate, we shouldn't stop polling. Otherwise it won't be adjusted again and keeps running at an initial crazy pace. Fixes: 800efb4c2857 ("drm/nouveau/drm/therm/fan: add a fallback if no fan control is specified in the vbios") Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1103356 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107447 Reported-by: Thomas Blume <thomas.blume@suse.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Reviewed-by: Martin Peres <martin.peres@free.fr> Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-13drm/nouveau/drm/nouveau: Check rc from drm_dp_mst_topology_mgr_resume()Lyude Paul1-2/+10
commit b89fdf7ae8500feae1100d8b283176a44d31d698 upstream. We need to actually make sure we check this on resume since otherwise we won't know whether or not the topology is still there once we've resumed, which will cause us to still think the topology is connected even after it's been removed if the removal happens mid-suspend. Signed-off-by: Lyude Paul <lyude@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-19drm/nouveau/kms/nv50-: also flush fb writes when rewinding push bufferBen Skeggs1-11/+18
commit 970a5ee41c72df46e3b0f307528c7d8ef7734a2e upstream. Should hopefully fix a regression some people have been seeing since EVO push buffers were moved to VRAM by default on Pascal GPUs. Fixes: d00ddd9da ("drm/nouveau/kms/nv50-: allocate push buffers in vidmem on pascal") Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Cc: <stable@vger.kernel.org> # 4.19+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-19drm/nouveau/kms: Fix memory leak in nv50_mstm_del()Lyude Paul1-0/+1
commit 24199c5436f267399afed0c4f1f57663c0408f57 upstream. Noticed this while working on redoing the reference counting scheme in the DP MST helpers. Nouveau doesn't attempt to call drm_dp_mst_topology_mgr_destroy() at all, which leaves it leaking all of the resources for drm_dp_mst_topology_mgr and it's children mstbs+ports. Fixes: f479c0ba4a17 ("drm/nouveau/kms/nv50: initial support for DP 1.2 multi-stream") Signed-off-by: Lyude Paul <lyude@redhat.com> Cc: <stable@vger.kernel.org> # v4.10+ Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21drm/nouveau: Fix nv50_mstc->best_encoder()Lyude Paul1-10/+4
commit 7b0f61e91b6056c71649efa3204112a4b6cf5fc8 upstream. As mentioned in the previous commit, we currently prevent new modesets on recently-removed MST connectors by returning no encoder from our ->best_encoder() callback once the MST port has disappeared. This is wrong however, because it prevents legacy modesetting users from being able to disable CRTCs on MST connectors after the connector's respective topology has disappeared. So, fix this by instead by just always returning a valid encoder. Changes since v2: - Remove usage of atomic MST helper for now, since that got replaced with a much simpler solution Signed-off-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Ben Skeggs <bskeggs@redhat.com> Cc: stable@vger.kernel.org Link: https://patchwork.freedesktop.org/patch/msgid/20181008232437.5571-3-lyude@redhat.com (cherry picked from commit e87b0bbc9f0380d403f8f2f6abba0d51c74d944f) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21drm/nouveau: Check backlight IDs are >= 0, not > 0Lyude Paul1-2/+2
commit dc854914999d5d52ac1b31740cb0ea8d89d0372e upstream. Remember, ida IDs start at 0, not 1! Signed-off-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21drm/nouveau/secboot/acr: fix memory leakGustavo A. R. Silva1-0/+1
[ Upstream commit 74a07c0a59fa372b069d879971ba4d9e341979cf ] In case memory resources for *bl_desc* were allocated, release them before return. Addresses-Coverity-ID: 1472021 ("Resource leak") Fixes: 0d466901552a ("drm/nouveau/secboot/acr: Remove VLA usage") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-05drm/nouveau/drm/nouveau: Grab runtime PM ref in nv50_mstc_detect()Lyude Paul1-1/+14
While we currently grab a runtime PM ref in nouveau's normal connector detection code, we apparently don't do this for MST. This means if we're in a scenario where the GPU is suspended and userspace attempts to do a connector probe on an MSTC connector, the probe will fail entirely due to the DP aux channel and GPU not being woken up: [ 316.633489] nouveau 0000:01:00.0: i2c: aux 000a: begin idle timeout ffffffff [ 316.635713] nouveau 0000:01:00.0: i2c: aux 000a: begin idle timeout ffffffff [ 316.637785] nouveau 0000:01:00.0: i2c: aux 000a: begin idle timeout ffffffff ... So, grab a runtime PM ref here. Signed-off-by: Lyude Paul <lyude@redhat.com> Cc: stable@vger.kernel.org Reviewed-by: Karol Herbst <kherbst@redhat.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-13drm/nouveau/devinit: fix warning when PMU/PRE_OS is missingBen Skeggs1-10/+11
Messed up when sending pull request and sent an outdated version of previous patch, this fixes it up to remove warnings. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-06drm/nouveau/disp/gm200-: enforce identity-mapped SOR assignment for LVDS/eDP ↵Ben Skeggs4-3/+28
panels Fixes eDP backlight issues on more recent laptops. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-06drm/nouveau/disp: fix DP disable raceBen Skeggs4-9/+19
If a HPD pulse signalling the need to retrain the link occurs between the KMS driver releasing the output and the supervisor interrupt that finishes the teardown, it was possible get a NULL-ptr deref. Avoid this by marking the link as inactive earlier. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-06drm/nouveau/disp: move eDP panel power handlingBen Skeggs2-25/+35
We need to do this earlier to prevent aux channel timeouts in resume paths on certain systems. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-06drm/nouveau/disp: remove unused struct memberBen Skeggs2-2/+0
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-06drm/nouveau/TBDdevinit: don't fail when PMU/PRE_OS is missing from VBIOSBen Skeggs1-1/+2
This Falcon application doesn't appear to be present on some newer systems, so let's not fail init if we can't find it. TBD: is there a way to determine whether it *should* be there? Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-06drm/nouveau/mmu: don't attempt to dereference vmm without valid instance pointerBen Skeggs1-1/+1
Fixes oopses in certain failure paths. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-06drm/nouveau: fix oops in client init failure pathBen Skeggs1-7/+7
The NV_ERROR macro requires drm->client to be initialised, which it may not be at this stage of the init process. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-06drm/nouveau: Fix nouveau_connector_ddc_detect()Lyude Paul1-21/+28
It looks like that when we moved over to using drm_connector_for_each_possible_encoder() in nouveau, that one rather important part of this function got dropped by accident: /* Right v here */ for (i = 0; nv_encoder = NULL, i < DRM_CONNECTOR_MAX_ENCODER; i++) { int id = connector->encoder_ids[i]; if (id == 0) break; Since it's rather difficult to notice: the conditional in this loop is actually: nv_encoder = NULL, i < DRM_CONNECTOR_MAX_ENCODER Meaning that all early breaks result in nv_encoder keeping it's value, otherwise nv_encoder = NULL. Ugh. Since this got dropped, nouveau_connector_ddc_detect() now returns an encoder for every single connector, regardless of whether or not it's detected: [ 1780.056185] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for DP-2 So: fix this to ensure we only return an encoder if we actually found one, and clean up the rest of the function while we're at it since it's nearly impossible to read properly. Changes since v1: - Don't skip ddc probing for LVDS if we can't switch DDC through vga-switcheroo, just do the DDC probing without calling vga_switcheroo_lock_ddc() - skeggsb Signed-off-by: Lyude Paul <lyude@redhat.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Fixes: ddba766dd07e ("drm/nouveau: Use drm_connector_for_each_possible_encoder()") Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-06drm/nouveau/drm/nouveau: Don't forget to cancel hpd_work on suspend/unloadLyude Paul3-5/+8
Currently, there's nothing in nouveau that actually cancels this work struct. So, cancel it on suspend/unload. Otherwise, if we're unlucky enough hpd_work might try to keep running up until the system is suspended. Signed-off-by: Lyude Paul <lyude@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-06drm/nouveau/drm/nouveau: Prevent handling ACPI HPD events too earlyLyude Paul1-6/+20
On most systems with ACPI hotplugging support, it seems that we always receive a hotplug event once we re-enable EC interrupts even if the GPU hasn't even been resumed yet. This can cause problems since even though we schedule hpd_work to handle connector reprobing for us, hpd_work synchronizes on pm_runtime_get_sync() to wait until the device is ready to perform reprobing. Since runtime suspend/resume callbacks are disabled before the PM core calls ->suspend(), any calls to pm_runtime_get_sync() during this period will grab a runtime PM ref and return immediately with -EACCES. Because we schedule hpd_work from our ACPI HPD handler, and hpd_work synchronizes on pm_runtime_get_sync(), this causes us to launch a connector reprobe immediately even if the GPU isn't actually resumed just yet. This causes various warnings in dmesg and occasionally, also prevents some displays connected to the dedicated GPU from coming back up after suspend. Example: usb 1-4: USB disconnect, device number 14 usb 1-4.1: USB disconnect, device number 15 WARNING: CPU: 0 PID: 838 at drivers/gpu/drm/nouveau/include/nvkm/subdev/i2c.h:170 nouveau_dp_detect+0x17e/0x370 [nouveau] CPU: 0 PID: 838 Comm: kworker/0:6 Not tainted 4.17.14-201.Lyude.bz1477182.V3.fc28.x86_64 #1 Hardware name: LENOVO 20EQS64N00/20EQS64N00, BIOS N1EET77W (1.50 ) 03/28/2018 Workqueue: events nouveau_display_hpd_work [nouveau] RIP: 0010:nouveau_dp_detect+0x17e/0x370 [nouveau] RSP: 0018:ffffa15143933cf0 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff8cb4f656c400 RCX: 0000000000000000 RDX: ffffa1514500e4e4 RSI: ffffa1514500e4e4 RDI: 0000000001009002 RBP: ffff8cb4f4a8a800 R08: ffffa15143933cfd R09: ffffa15143933cfc R10: 0000000000000000 R11: 0000000000000000 R12: ffff8cb4fb57a000 R13: ffff8cb4fb57a000 R14: ffff8cb4f4a8f800 R15: ffff8cb4f656c418 FS: 0000000000000000(0000) GS:ffff8cb51f400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f78ec938000 CR3: 000000073720a003 CR4: 00000000003606f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ? _cond_resched+0x15/0x30 nouveau_connector_detect+0x2ce/0x520 [nouveau] ? _cond_resched+0x15/0x30 ? ww_mutex_lock+0x12/0x40 drm_helper_probe_detect_ctx+0x8b/0xe0 [drm_kms_helper] drm_helper_hpd_irq_event+0xa8/0x120 [drm_kms_helper] nouveau_display_hpd_work+0x2a/0x60 [nouveau] process_one_work+0x187/0x340 worker_thread+0x2e/0x380 ? pwq_unbound_release_workfn+0xd0/0xd0 kthread+0x112/0x130 ? kthread_create_worker_on_cpu+0x70/0x70 ret_from_fork+0x35/0x40 Code: 4c 8d 44 24 0d b9 00 05 00 00 48 89 ef ba 09 00 00 00 be 01 00 00 00 e8 e1 09 f8 ff 85 c0 0f 85 b2 01 00 00 80 7c 24 0c 03 74 02 <0f> 0b 48 89 ef e8 b8 07 f8 ff f6 05 51 1b c8 ff 02 0f 84 72 ff ---[ end trace 55d811b38fc8e71a ]--- So, to fix this we attempt to grab a runtime PM reference in the ACPI handler itself asynchronously. If the GPU is already awake (it will have normal hotplugging at this point) or runtime PM callbacks are currently disabled on the device, we drop our reference without updating the autosuspend delay. We only schedule connector reprobes when we successfully managed to queue up a resume request with our asynchronous PM ref. This also has the added benefit of preventing redundant connector reprobes from ACPI while the GPU is runtime resumed! Signed-off-by: Lyude Paul <lyude@redhat.com> Cc: stable@vger.kernel.org Cc: Karol Herbst <kherbst@redhat.com> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1477182#c41 Signed-off-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-06drm/nouveau: Reset MST branching unit before enablingLyude Paul1-8/+12
When probing a new MST device, it's not safe to make any assumptions about it's current state. While most well mannered MST hubs will just disable the branching unit on hotplug disconnects, this isn't enough to save us from various other scenarios that might have resulted in something writing to the MST branching unit before we got control of it. This could happen if a previous probe we tried failed, if we're booting in kexec context and the hub is still in the state the last kernel put it in, etc. Luckily; there is no reason we can't just reset the branching unit every time we enable a new topology. So, fix this by resetting it on enabling new topologies to ensure that we always start off with a clean, unmodified topology state on MST sinks. This fixes occasional hard-lockups on my P50's laptop dock (e.g. AUX times out all DPCD trasactions) observed after multiple docks, undocks, and module reloads. Signed-off-by: Lyude Paul <lyude@redhat.com> Cc: stable@vger.kernel.org Cc: Karol Herbst <karolherbst@gmail.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-06drm/nouveau: Only write DP_MSTM_CTRL when neededLyude Paul1-9/+36
Currently, nouveau will re-write the DP_MSTM_CTRL register for an MST hub every time it receives a long HPD pulse on DP. This isn't actually necessary and additionally, has some unintended side effects. With the P50 I've got here, rewriting DP_MSTM_CTRL constantly seems to make it rather likely (1 out of 5 times usually) that bringing up MST with it's ThinkPad dock will fail and result in sideband messages timing out in the middle. Afterwards, successive probes don't manage to get the dock to communicate properly over MST sideband properly. Many times sideband message timeouts from MST hubs are indicative of either the source or the sink dropping an ESI event, which can cause DRM's perspective of the topology's current state to go out of sync with reality. While it's tough to really know for sure what's happening to the dock, using userspace tools to write to DP_MSTM_CTRL in the middle of the MST link probing process does appear to make things flaky. It's possible that when we write to DP_MSTM_CTRL, the function that gets triggered to respond in the dock's firmware temporarily puts it in a state where it might end up not reporting an ESI to the source, or ends up dropping a sideband message we sent it. So, to fix this we make it so that when probing an MST topology, we respect it's current state. If the dock's already enabled, we simply read DP_MSTM_CTRL and disable the topology if it's value is not what we expected. Otherwise, we perform the normal MST probing dance. We avoid taking any action except if the state of the MST topology actually changes. This fixes MST sideband message timeouts and detection failures on my P50 with its ThinkPad dock. Signed-off-by: Lyude Paul <lyude@redhat.com> Cc: stable@vger.kernel.org Cc: Karol Herbst <karolherbst@gmail.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-06drm/nouveau: Remove useless poll_enable() call in drm_load()Lyude Paul1-3/+1
Again, this doesn't do anything. drm_kms_helper_poll_enable() will have already been called in nouveau_display_init() Signed-off-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Acked-by: Daniel Vetter <daniel@ffwll.ch> Cc: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-06drm/nouveau: Remove useless poll_disable() call in switcheroo_set_state()Lyude Paul1-1/+0
This won't do anything but potentially make us miss hotplugs. We already call drm_kms_helper_poll_disable() in nouveau_pmops_suspend()->nouveau_display_suspend()->nouveau_display_fini() Signed-off-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Acked-by: Daniel Vetter <daniel@ffwll.ch> Cc: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-06drm/nouveau: Remove useless poll_enable() call in switcheroo_set_state()Lyude Paul1-1/+0
This doesn't do anything, drm_kms_helper_poll_enable() gets called in nouveau_pmops_resume()->nouveau_display_resume()->nouveau_display_init() already. Signed-off-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Acked-by: Daniel Vetter <daniel@ffwll.ch> Cc: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-06drm/nouveau: Fix deadlocks in nouveau_connector_detect()Lyude Paul1-0/+22
When we disable hotplugging on the GPU, we need to be able to synchronize with each connector's hotplug interrupt handler before the interrupt is finally disabled. This can be a problem however, since nouveau_connector_detect() currently grabs a runtime power reference when handling connector probing. This will deadlock the runtime suspend handler like so: [ 861.480896] INFO: task kworker/0:2:61 blocked for more than 120 seconds. [ 861.483290] Tainted: G O 4.18.0-rc6Lyude-Test+ #1 [ 861.485158] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 861.486332] kworker/0:2 D 0 61 2 0x80000000 [ 861.487044] Workqueue: events nouveau_display_hpd_work [nouveau] [ 861.487737] Call Trace: [ 861.488394] __schedule+0x322/0xaf0 [ 861.489070] schedule+0x33/0x90 [ 861.489744] rpm_resume+0x19c/0x850 [ 861.490392] ? finish_wait+0x90/0x90 [ 861.491068] __pm_runtime_resume+0x4e/0x90 [ 861.491753] nouveau_display_hpd_work+0x22/0x60 [nouveau] [ 861.492416] process_one_work+0x231/0x620 [ 861.493068] worker_thread+0x44/0x3a0 [ 861.493722] kthread+0x12b/0x150 [ 861.494342] ? wq_pool_ids_show+0x140/0x140 [ 861.494991] ? kthread_create_worker_on_cpu+0x70/0x70 [ 861.495648] ret_from_fork+0x3a/0x50 [ 861.496304] INFO: task kworker/6:2:320 blocked for more than 120 seconds. [ 861.496968] Tainted: G O 4.18.0-rc6Lyude-Test+ #1 [ 861.497654] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 861.498341] kworker/6:2 D 0 320 2 0x80000080 [ 861.499045] Workqueue: pm pm_runtime_work [ 861.499739] Call Trace: [ 861.500428] __schedule+0x322/0xaf0 [ 861.501134] ? wait_for_completion+0x104/0x190 [ 861.501851] schedule+0x33/0x90 [ 861.502564] schedule_timeout+0x3a5/0x590 [ 861.503284] ? mark_held_locks+0x58/0x80 [ 861.503988] ? _raw_spin_unlock_irq+0x2c/0x40 [ 861.504710] ? wait_for_completion+0x104/0x190 [ 861.505417] ? trace_hardirqs_on_caller+0xf4/0x190 [ 861.506136] ? wait_for_completion+0x104/0x190 [ 861.506845] wait_for_completion+0x12c/0x190 [ 861.507555] ? wake_up_q+0x80/0x80 [ 861.508268] flush_work+0x1c9/0x280 [ 861.508990] ? flush_workqueue_prep_pwqs+0x1b0/0x1b0 [ 861.509735] nvif_notify_put+0xb1/0xc0 [nouveau] [ 861.510482] nouveau_display_fini+0xbd/0x170 [nouveau] [ 861.511241] nouveau_display_suspend+0x67/0x120 [nouveau] [ 861.511969] nouveau_do_suspend+0x5e/0x2d0 [nouveau] [ 861.512715] nouveau_pmops_runtime_suspend+0x47/0xb0 [nouveau] [ 861.513435] pci_pm_runtime_suspend+0x6b/0x180 [ 861.514165] ? pci_has_legacy_pm_support+0x70/0x70 [ 861.514897] __rpm_callback+0x7a/0x1d0 [ 861.515618] ? pci_has_legacy_pm_support+0x70/0x70 [ 861.516313] rpm_callback+0x24/0x80 [ 861.517027] ? pci_has_legacy_pm_support+0x70/0x70 [ 861.517741] rpm_suspend+0x142/0x6b0 [ 861.518449] pm_runtime_work+0x97/0xc0 [ 861.519144] process_one_work+0x231/0x620 [ 861.519831] worker_thread+0x44/0x3a0 [ 861.520522] kthread+0x12b/0x150 [ 861.521220] ? wq_pool_ids_show+0x140/0x140 [ 861.521925] ? kthread_create_worker_on_cpu+0x70/0x70 [ 861.522622] ret_from_fork+0x3a/0x50 [ 861.523299] INFO: task kworker/6:0:1329 blocked for more than 120 seconds. [ 861.523977] Tainted: G O 4.18.0-rc6Lyude-Test+ #1 [ 861.524644] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 861.525349] kworker/6:0 D 0 1329 2 0x80000000 [ 861.526073] Workqueue: events nvif_notify_work [nouveau] [ 861.526751] Call Trace: [ 861.527411] __schedule+0x322/0xaf0 [ 861.528089] schedule+0x33/0x90 [ 861.528758] rpm_resume+0x19c/0x850 [ 861.529399] ? finish_wait+0x90/0x90 [ 861.530073] __pm_runtime_resume+0x4e/0x90 [ 861.530798] nouveau_connector_detect+0x7e/0x510 [nouveau] [ 861.531459] ? ww_mutex_lock+0x47/0x80 [ 861.532097] ? ww_mutex_lock+0x47/0x80 [ 861.532819] ? drm_modeset_lock+0x88/0x130 [drm] [ 861.533481] drm_helper_probe_detect_ctx+0xa0/0x100 [drm_kms_helper] [ 861.534127] drm_helper_hpd_irq_event+0xa4/0x120 [drm_kms_helper] [ 861.534940] nouveau_connector_hotplug+0x98/0x120 [nouveau] [ 861.535556] nvif_notify_work+0x2d/0xb0 [nouveau] [ 861.536221] process_one_work+0x231/0x620 [ 861.536994] worker_thread+0x44/0x3a0 [ 861.537757] kthread+0x12b/0x150 [ 861.538463] ? wq_pool_ids_show+0x140/0x140 [ 861.539102] ? kthread_create_worker_on_cpu+0x70/0x70 [ 861.539815] ret_from_fork+0x3a/0x50 [ 861.540521] Showing all locks held in the system: [ 861.541696] 2 locks held by kworker/0:2/61: [ 861.542406] #0: 000000002dbf8af5 ((wq_completion)"events"){+.+.}, at: process_one_work+0x1b3/0x620 [ 861.543071] #1: 0000000076868126 ((work_completion)(&drm->hpd_work)){+.+.}, at: process_one_work+0x1b3/0x620 [ 861.543814] 1 lock held by khungtaskd/64: [ 861.544535] #0: 0000000059db4b53 (rcu_read_lock){....}, at: debug_show_all_locks+0x23/0x185 [ 861.545160] 3 locks held by kworker/6:2/320: [ 861.545896] #0: 00000000d9e1bc59 ((wq_completion)"pm"){+.+.}, at: process_one_work+0x1b3/0x620 [ 861.546702] #1: 00000000c9f92d84 ((work_completion)(&dev->power.work)){+.+.}, at: process_one_work+0x1b3/0x620 [ 861.547443] #2: 000000004afc5de1 (drm_connector_list_iter){.+.+}, at: nouveau_display_fini+0x96/0x170 [nouveau] [ 861.548146] 1 lock held by dmesg/983: [ 861.548889] 2 locks held by zsh/1250: [ 861.549605] #0: 00000000348e3cf6 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x37/0x40 [ 861.550393] #1: 000000007009a7a8 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0xc1/0x870 [ 861.551122] 6 locks held by kworker/6:0/1329: [ 861.551957] #0: 000000002dbf8af5 ((wq_completion)"events"){+.+.}, at: process_one_work+0x1b3/0x620 [ 861.552765] #1: 00000000ddb499ad ((work_completion)(&notify->work)#2){+.+.}, at: process_one_work+0x1b3/0x620 [ 861.553582] #2: 000000006e013cbe (&dev->mode_config.mutex){+.+.}, at: drm_helper_hpd_irq_event+0x6c/0x120 [drm_kms_helper] [ 861.554357] #3: 000000004afc5de1 (drm_connector_list_iter){.+.+}, at: drm_helper_hpd_irq_event+0x78/0x120 [drm_kms_helper] [ 861.555227] #4: 0000000044f294d9 (crtc_ww_class_acquire){+.+.}, at: drm_helper_probe_detect_ctx+0x3d/0x100 [drm_kms_helper] [ 861.556133] #5: 00000000db193642 (crtc_ww_class_mutex){+.+.}, at: drm_modeset_lock+0x4b/0x130 [drm] [ 861.557864] ============================================= [ 861.559507] NMI backtrace for cpu 2 [ 861.560363] CPU: 2 PID: 64 Comm: khungtaskd Tainted: G O 4.18.0-rc6Lyude-Test+ #1 [ 861.561197] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET78W (1.51 ) 05/18/2018 [ 861.561948] Call Trace: [ 861.562757] dump_stack+0x8e/0xd3 [ 861.563516] nmi_cpu_backtrace.cold.3+0x14/0x5a [ 861.564269] ? lapic_can_unplug_cpu.cold.27+0x42/0x42 [ 861.565029] nmi_trigger_cpumask_backtrace+0xa1/0xae [ 861.565789] arch_trigger_cpumask_backtrace+0x19/0x20 [ 861.566558] watchdog+0x316/0x580 [ 861.567355] kthread+0x12b/0x150 [ 861.568114] ? reset_hung_task_detector+0x20/0x20 [ 861.568863] ? kthread_create_worker_on_cpu+0x70/0x70 [ 861.569598] ret_from_fork+0x3a/0x50 [ 861.570370] Sending NMI from CPU 2 to CPUs 0-1,3-7: [ 861.571426] NMI backtrace for cpu 6 skipped: idling at intel_idle+0x7f/0x120 [ 861.571429] NMI backtrace for cpu 7 skipped: idling at intel_idle+0x7f/0x120 [ 861.571432] NMI backtrace for cpu 3 skipped: idling at intel_idle+0x7f/0x120 [ 861.571464] NMI backtrace for cpu 5 skipped: idling at intel_idle+0x7f/0x120 [ 861.571467] NMI backtrace for cpu 0 skipped: idling at intel_idle+0x7f/0x120 [ 861.571469] NMI backtrace for cpu 4 skipped: idling at intel_idle+0x7f/0x120 [ 861.571472] NMI backtrace for cpu 1 skipped: idling at intel_idle+0x7f/0x120 [ 861.572428] Kernel panic - not syncing: hung_task: blocked tasks So: fix this by making it so that normal hotplug handling /only/ happens so long as the GPU is currently awake without any pending runtime PM requests. In the event that a hotplug occurs while the device is suspending or resuming, we can simply defer our response until the GPU is fully runtime resumed again. Changes since v4: - Use a new trick I came up with using pm_runtime_get() instead of the hackish junk we had before Signed-off-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Acked-by: Daniel Vetter <daniel@ffwll.ch> Cc: stable@vger.kernel.org Cc: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-06drm/nouveau/drm/nouveau: Use pm_runtime_get_noresume() in connector_detect()Lyude Paul1-9/+11
It's true we can't resume the device from poll workers in nouveau_connector_detect(). We can however, prevent the autosuspend timer from elapsing immediately if it hasn't already without risking any sort of deadlock with the runtime suspend/resume operations. So do that instead of entirely avoiding grabbing a power reference. Signed-off-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Acked-by: Daniel Vetter <daniel@ffwll.ch> Cc: stable@vger.kernel.org Cc: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-06drm/nouveau/drm/nouveau: Fix deadlock with fb_helper with async RPM requestsLyude Paul4-2/+64
Currently, nouveau uses the generic drm_fb_helper_output_poll_changed() function provided by DRM as it's output_poll_changed callback. Unfortunately however, this function doesn't grab runtime PM references early enough and even if it did-we can't block waiting for the device to resume in output_poll_changed() since it's very likely that we'll need to grab the fb_helper lock at some point during the runtime resume process. This currently results in deadlocking like so: [ 246.669625] INFO: task kworker/4:0:37 blocked for more than 120 seconds. [ 246.673398] Not tainted 4.18.0-rc5Lyude-Test+ #2 [ 246.675271] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 246.676527] kworker/4:0 D 0 37 2 0x80000000 [ 246.677580] Workqueue: events output_poll_execute [drm_kms_helper] [ 246.678704] Call Trace: [ 246.679753] __schedule+0x322/0xaf0 [ 246.680916] schedule+0x33/0x90 [ 246.681924] schedule_preempt_disabled+0x15/0x20 [ 246.683023] __mutex_lock+0x569/0x9a0 [ 246.684035] ? kobject_uevent_env+0x117/0x7b0 [ 246.685132] ? drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper] [ 246.686179] mutex_lock_nested+0x1b/0x20 [ 246.687278] ? mutex_lock_nested+0x1b/0x20 [ 246.688307] drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper] [ 246.689420] drm_fb_helper_output_poll_changed+0x23/0x30 [drm_kms_helper] [ 246.690462] drm_kms_helper_hotplug_event+0x2a/0x30 [drm_kms_helper] [ 246.691570] output_poll_execute+0x198/0x1c0 [drm_kms_helper] [ 246.692611] process_one_work+0x231/0x620 [ 246.693725] worker_thread+0x214/0x3a0 [ 246.694756] kthread+0x12b/0x150 [ 246.695856] ? wq_pool_ids_show+0x140/0x140 [ 246.696888] ? kthread_create_worker_on_cpu+0x70/0x70 [ 246.697998] ret_from_fork+0x3a/0x50 [ 246.699034] INFO: task kworker/0:1:60 blocked for more than 120 seconds. [ 246.700153] Not tainted 4.18.0-rc5Lyude-Test+ #2 [ 246.701182] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 246.702278] kworker/0:1 D 0 60 2 0x80000000 [ 246.703293] Workqueue: pm pm_runtime_work [ 246.704393] Call Trace: [ 246.705403] __schedule+0x322/0xaf0 [ 246.706439] ? wait_for_completion+0x104/0x190 [ 246.707393] schedule+0x33/0x90 [ 246.708375] schedule_timeout+0x3a5/0x590 [ 246.709289] ? mark_held_locks+0x58/0x80 [ 246.710208] ? _raw_spin_unlock_irq+0x2c/0x40 [ 246.711222] ? wait_for_completion+0x104/0x190 [ 246.712134] ? trace_hardirqs_on_caller+0xf4/0x190 [ 246.713094] ? wait_for_completion+0x104/0x190 [ 246.713964] wait_for_completion+0x12c/0x190 [ 246.714895] ? wake_up_q+0x80/0x80 [ 246.715727] ? get_work_pool+0x90/0x90 [ 246.716649] flush_work+0x1c9/0x280 [ 246.717483] ? flush_workqueue_prep_pwqs+0x1b0/0x1b0 [ 246.718442] __cancel_work_timer+0x146/0x1d0 [ 246.719247] cancel_delayed_work_sync+0x13/0x20 [ 246.720043] drm_kms_helper_poll_disable+0x1f/0x30 [drm_kms_helper] [ 246.721123] nouveau_pmops_runtime_suspend+0x3d/0xb0 [nouveau] [ 246.721897] pci_pm_runtime_suspend+0x6b/0x190 [ 246.722825] ? pci_has_legacy_pm_support+0x70/0x70 [ 246.723737] __rpm_callback+0x7a/0x1d0 [ 246.724721] ? pci_has_legacy_pm_support+0x70/0x70 [ 246.725607] rpm_callback+0x24/0x80 [ 246.726553] ? pci_has_legacy_pm_support+0x70/0x70 [ 246.727376] rpm_suspend+0x142/0x6b0 [ 246.728185] pm_runtime_work+0x97/0xc0 [ 246.728938] process_one_work+0x231/0x620 [ 246.729796] worker_thread+0x44/0x3a0 [ 246.730614] kthread+0x12b/0x150 [ 246.731395] ? wq_pool_ids_show+0x140/0x140 [ 246.732202] ? kthread_create_worker_on_cpu+0x70/0x70 [ 246.732878] ret_from_fork+0x3a/0x50 [ 246.733768] INFO: task kworker/4:2:422 blocked for more than 120 seconds. [ 246.734587] Not tainted 4.18.0-rc5Lyude-Test+ #2 [ 246.735393] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 246.736113] kworker/4:2 D 0 422 2 0x80000080 [ 246.736789] Workqueue: events_long drm_dp_mst_link_probe_work [drm_kms_helper] [ 246.737665] Call Trace: [ 246.738490] __schedule+0x322/0xaf0 [ 246.739250] schedule+0x33/0x90 [ 246.739908] rpm_resume+0x19c/0x850 [ 246.740750] ? finish_wait+0x90/0x90 [ 246.741541] __pm_runtime_resume+0x4e/0x90 [ 246.742370] nv50_disp_atomic_commit+0x31/0x210 [nouveau] [ 246.743124] drm_atomic_commit+0x4a/0x50 [drm] [ 246.743775] restore_fbdev_mode_atomic+0x1c8/0x240 [drm_kms_helper] [ 246.744603] restore_fbdev_mode+0x31/0x140 [drm_kms_helper] [ 246.745373] drm_fb_helper_restore_fbdev_mode_unlocked+0x54/0xb0 [drm_kms_helper] [ 246.746220] drm_fb_helper_set_par+0x2d/0x50 [drm_kms_helper] [ 246.746884] drm_fb_helper_hotplug_event.part.28+0x96/0xb0 [drm_kms_helper] [ 246.747675] drm_fb_helper_output_poll_changed+0x23/0x30 [drm_kms_helper] [ 246.748544] drm_kms_helper_hotplug_event+0x2a/0x30 [drm_kms_helper] [ 246.749439] nv50_mstm_hotplug+0x15/0x20 [nouveau] [ 246.750111] drm_dp_send_link_address+0x177/0x1c0 [drm_kms_helper] [ 246.750764] drm_dp_check_and_send_link_address+0xa8/0xd0 [drm_kms_helper] [ 246.751602] drm_dp_mst_link_probe_work+0x51/0x90 [drm_kms_helper] [ 246.752314] process_one_work+0x231/0x620 [ 246.752979] worker_thread+0x44/0x3a0 [ 246.753838] kthread+0x12b/0x150 [ 246.754619] ? wq_pool_ids_show+0x140/0x140 [ 246.755386] ? kthread_create_worker_on_cpu+0x70/0x70 [ 246.756162] ret_from_fork+0x3a/0x50 [ 246.756847] Showing all locks held in the system: [ 246.758261] 3 locks held by kworker/4:0/37: [ 246.759016] #0: 00000000f8df4d2d ((wq_completion)"events"){+.+.}, at: process_one_work+0x1b3/0x620 [ 246.759856] #1: 00000000e6065461 ((work_completion)(&(&dev->mode_config.output_poll_work)->work)){+.+.}, at: process_one_work+0x1b3/0x620 [ 246.760670] #2: 00000000cb66735f (&helper->lock){+.+.}, at: drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper] [ 246.761516] 2 locks held by kworker/0:1/60: [ 246.762274] #0: 00000000fff6be0f ((wq_completion)"pm"){+.+.}, at: process_one_work+0x1b3/0x620 [ 246.762982] #1: 000000005ab44fb4 ((work_completion)(&dev->power.work)){+.+.}, at: process_one_work+0x1b3/0x620 [ 246.763890] 1 lock held by khungtaskd/64: [ 246.764664] #0: 000000008cb8b5c3 (rcu_read_lock){....}, at: debug_show_all_locks+0x23/0x185 [ 246.765588] 5 locks held by kworker/4:2/422: [ 246.766440] #0: 00000000232f0959 ((wq_completion)"events_long"){+.+.}, at: process_one_work+0x1b3/0x620 [ 246.767390] #1: 00000000bb59b134 ((work_completion)(&mgr->work)){+.+.}, at: process_one_work+0x1b3/0x620 [ 246.768154] #2: 00000000cb66735f (&helper->lock){+.+.}, at: drm_fb_helper_restore_fbdev_mode_unlocked+0x4c/0xb0 [drm_kms_helper] [ 246.768966] #3: 000000004c8f0b6b (crtc_ww_class_acquire){+.+.}, at: restore_fbdev_mode_atomic+0x4b/0x240 [drm_kms_helper] [ 246.769921] #4: 000000004c34a296 (crtc_ww_class_mutex){+.+.}, at: drm_modeset_backoff+0x8a/0x1b0 [drm] [ 246.770839] 1 lock held by dmesg/1038: [ 246.771739] 2 locks held by zsh/1172: [ 246.772650] #0: 00000000836d0438 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x37/0x40 [ 246.773680] #1: 000000001f4f4d48 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0xc1/0x870 [ 246.775522] ============================================= After trying dozens of different solutions, I found one very simple one that should also have the benefit of preventing us from having to fight locking for the rest of our lives. So, we work around these deadlocks by deferring all fbcon hotplug events that happen after the runtime suspend process starts until after the device is resumed again. Changes since v7: - Fixup commit message - Daniel Vetter Changes since v6: - Remove unused nouveau_fbcon_hotplugged_in_suspend() - Ilia Changes since v5: - Come up with the (hopefully final) solution for solving this dumb problem, one that is a lot less likely to cause issues with locking in the future. This should work around all deadlock conditions with fbcon brought up thus far. Changes since v4: - Add nouveau_fbcon_hotplugged_in_suspend() to workaround deadlock condition that Lukas described - Just move all of this out of drm_fb_helper. It seems that other DRM drivers have already figured out other workarounds for this. If other drivers do end up needing this in the future, we can just move this back into drm_fb_helper again. Changes since v3: - Actually check if fb_helper is NULL in both new helpers - Actually check drm_fbdev_emulation in both new helpers - Don't fire off a fb_helper hotplug unconditionally; only do it if the following conditions are true (as otherwise, calling this in the wrong spot will cause Bad Things to happen): - fb_helper hotplug handling was actually inhibited previously - fb_helper actually has a delayed hotplug pending - fb_helper is actually bound - fb_helper is actually initialized - Add __must_check to drm_fb_helper_suspend_hotplug(). There's no situation where a driver would actually want to use this without checking the return value, so enforce that - Rewrite and clarify the documentation for both helpers. - Make sure to return true in the drm_fb_helper_suspend_hotplug() stub that's provided in drm_fb_helper.h when CONFIG_DRM_FBDEV_EMULATION isn't enabled - Actually grab the toplevel fb_helper lock in drm_fb_helper_resume_hotplug(), since it's possible other activity (such as a hotplug) could be going on at the same time the driver calls drm_fb_helper_resume_hotplug(). We need this to check whether or not drm_fb_helper_hotplug_event() needs to be called anyway Signed-off-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Acked-by: Daniel Vetter <daniel@ffwll.ch> Cc: stable@vger.kernel.org Cc: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-06drm/nouveau: Remove duplicate poll_enable() in pmops_runtime_suspend()Lyude Paul1-1/+0
Since actual hotplug notifications don't get disabled until nouveau_display_fini() is called, all this will do is cause any hotplugs that happen between this drm_kms_helper_poll_disable() call and the actual hotplug disablement to potentially be dropped if ACPI isn't around to help us. Signed-off-by: Lyude Paul <lyude@redhat.com> Acked-by: Karol Herbst <kherbst@redhat.com> Acked-by: Daniel Vetter <daniel@ffwll.ch> Cc: stable@vger.kernel.org Cc: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-06drm/nouveau/drm/nouveau: Fix bogus drm_kms_helper_poll_enable() placementLyude Paul1-2/+5
Turns out this part is my fault for not noticing when reviewing 9a2eba337cace ("drm/nouveau: Fix drm poll_helper handling"). Currently we call drm_kms_helper_poll_enable() from nouveau_display_hpd_work(). This makes basically no sense however, because that means we're calling drm_kms_helper_poll_enable() every time we schedule the hotplug detection work. This is also against the advice mentioned in drm_kms_helper_poll_enable()'s documentation: Note that calls to enable and disable polling must be strictly ordered, which is automatically the case when they're only call from suspend/resume callbacks. Of course, hotplugs can't really be ordered. They could even happen immediately after we called drm_kms_helper_poll_disable() in nouveau_display_fini(), which can lead to all sorts of issues. Additionally; enabling polling /after/ we call drm_helper_hpd_irq_event() could also mean that we'd miss a hotplug event anyway, since drm_helper_hpd_irq_event() wouldn't bother trying to probe connectors so long as polling is disabled. So; simply move this back into nouveau_display_init() again. The race condition that both of these patches attempted to work around has already been fixed properly in d61a5c106351 ("drm/nouveau: Fix deadlock on runtime suspend") Fixes: 9a2eba337cace ("drm/nouveau: Fix drm poll_helper handling") Signed-off-by: Lyude Paul <lyude@redhat.com> Acked-by: Karol Herbst <kherbst@redhat.com> Acked-by: Daniel Vetter <daniel@ffwll.ch> Cc: Lukas Wunner <lukas@wunner.de> Cc: Peter Ujfalusi <peter.ujfalusi@ti.com> Cc: stable@vger.kernel.org Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-30BackMerge v4.18-rc7 into drm-nextDave Airlie12-49/+105
rmk requested this for armada and I think we've had a few conflicts build up. Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-07-20Merge tag 'drm-misc-next-2018-07-18' of ↵Dave Airlie6-14/+14
git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for 4.19: Core Changes: - add support for DisplayPort CEC-Tunneling-over-AUX (Hans Verkuil) - more doc updates (Daniel Vetter) - fourcc: Add is_yuv field to drm_format_info (Ayan Kumar Halder) - dma-buf: correctly place BUG_ON (Michel Dänzer) Driver Changes: - more vkms support(Rodrigo Siqueira) - many fixes and small improments to all drivers Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180718200826.GA20165@juma
2018-07-20Merge branch 'linux-4.19' of git://github.com/skeggsb/linux into drm-nextDave Airlie26-58/+161
misc fixes and cleanups for next. Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/CACAvsv55CfRonQ0bo2XiitkCiWTjKwhsP=+ZFhoa-BaJ72Ryew@mail.gmail.com
2018-07-20Merge branch 'linux-4.18' of git://github.com/skeggsb/linux into drm-fixesDave Airlie12-49/+105
- fix problem with pascal and large memory systems - fix a bunch of MST problems - fix a runtime PM interaction with MST Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/CACAvsv79O8deSts2fxJ_oS6=q8yA+OgwBSEpp5R=BQBmWa+oyg@mail.gmail.com
2018-07-19drm/nouveau/kms/nv50-: allocate push buffers in vidmem on pascalBen Skeggs1-2/+27
Workaround for issues seen on systems with large amounts of RAM, caused by display not supporting the same physical address limits as the other parts of the GPU. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-19drm/nouveau/fb/gp100-: disable address remapperBen Skeggs4-0/+15
This was causing problems on a system with a large amount of RAM, where display push buffers were being fetched incorrectly when placed in high system memory addresses. While this commit will resolve the issue on that particular system, the issue will be avoided completely with another patch to more fully solve problems with display and large amounts of system memory on Pascal. It's still probably a good idea to disable this to prevent weird issues in the future. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16drm/nouveau: tegra: Detach from ARM DMA/IOMMU mappingThierry Reding1-0/+13
Depending on the kernel configuration, early ARM architecture setup code may have attached the GPU to a DMA/IOMMU mapping that transparently uses the IOMMU to back the DMA API. Tegra requires special handling for IOMMU backed buffers (a special bit in the GPU's MMU page tables indicates the memory path to take: via the SMMU or directly to the memory controller). Transparently backing DMA memory with an IOMMU prevents Nouveau from properly handling such memory accesses and causes memory access faults. As a side-note: buffers other than those allocated in instance memory don't need to be physically contiguous from the GPU's perspective since the GPU can map them into contiguous buffers using its own MMU. Mapping these buffers through the IOMMU is unnecessary and will even lead to performance degradation because of the additional translation. One exception to this are compressible buffers which need large pages. In order to enable these large pages, multiple small pages will have to be combined into one large (I/O virtually contiguous) mapping via the IOMMU. However, that is a topic outside the scope of this fix and isn't currently supported. An implementation will want to explicitly create these large pages in the Nouveau driver, so detaching from a DMA/IOMMU mapping would still be required. Signed-off-by: Thierry Reding <treding@nvidia.com> Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Nicolas Chauvet <kwizart@gmail.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16drm/nouveau/secboot/acr: Remove VLA usageKees Cook2-4/+37
In the quest to remove all stack VLA usage from the kernel[1], this allocates the working buffers before starting the writing so it won't abort in the middle. This needs an initial walk of the lists to figure out how large the buffer should be. [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16drm/nouveau: Replace drm_dev_unref with drm_dev_putThomas Zimmermann1-1/+1
This patch unifies the naming of DRM functions for reference counting of struct drm_device. The resulting code is more aligned with the rest of the Linux kernel interfaces. Signed-off-by: Thomas Zimmermann <tdz@users.sourceforge.net> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16drm/nouveau: Replace drm_gem_object_unreference_unlocked with put functionThomas Zimmermann4-13/+13
This patch unifies the naming of DRM functions for reference counting of struct drm_gem_object. The resulting code is more aligned with the rest of the Linux kernel interfaces. Signed-off-by: Thomas Zimmermann <tdz@users.sourceforge.net> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16drm/nouveau: Replace drm_framebuffer_{un/reference} with put, get functionsThomas Zimmermann1-1/+1
This patch unifies the naming of DRM functions for reference counting of struct drm_framebuffer. The resulting code is more aligned with the rest of the Linux kernel interfaces. Signed-off-by: Thomas Zimmermann <tdz@users.sourceforge.net> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16drm/nouveau/nvif: remove const attribute from nvif_mclassNick Desaulniers1-1/+1
Similar to commit 0bf8bf50eddc ("module: Remove const attribute from alias for MODULE_DEVICE_TABLE") Fixes many -Wduplicate-decl-specifier warnings due to the combination of const typeof() of already const variables. Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16drm/nouveau/hwmon: potential uninitialized variablesDan Carpenter1-6/+6
Smatch complains that "value" can be uninitialized when kstrtol() returns -ERANGE. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16drm/nouveau: Fix runtime PM leak in drm_open()Lyude Paul1-2/+4
Noticed this as I was skimming through, if we fail to allocate memory for cli we'll end up returning without dropping the runtime PM ref we got. Additionally, we'll even return the wrong return code! (ret most likely will == 0 here, we want -ENOMEM). Signed-off-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16drm/nouveau/debugfs: Wake up GPU before doing any reclockingKarol Herbst1-0/+4
Fixes various reclocking related issues on prime systems. Signed-off-by: Karol Herbst <karolherbst@gmail.com> Signed-off-by: Martin Peres <martin.peres@free.fr> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16drm/nouveau/bios/vpstate: There are some fermi vbios with no boost or tdp entryKarol Herbst1-2/+8
If the entry size is too small, default to invalid values for both boost_id and tdp_id, so as to default to the base clock in both cases. Signed-off-by: Karol Herbst <karolherbst@gmail.com> Signed-off-by: Martin Peres <martin.peres@free.fr> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16drm/nouveau/kms/nv50-: Allow vblank_disable_immediateMario Kleiner1-0/+3
With instantaneous high precision vblank timestamping that updates at leading edge of vblank, the emulated "hw vblank counter" from vblank timestamping, which increments at leading edge of vblank, and reliable page flip execution and completion at leading edge of vblank, we should meet the requirements for fast/ immediate vblank irq disable/enable. This is only allowed on nv50+ gpu's, ie. the ones with atomic modesetting. One requirement for immediate vblank disable is that high precision vblank timestamping works reliably all the time on all connectors. This is not the case on all pre-nv50 parts for analog VGA outputs, where we currently don't always have support for scanout position queries and therefore fall back to vblank interrupt timestamping. The implementation in nv04_head_state() does not return valid values for vblanks, vtotal, hblanks, htotal for VGA outputs on all cards, but those are needed for scanout position queries. Testing on Linux-4.12-rc5 + drm-next on a GeForce 9500 GT (NV G96) with timing measurement equipment indicates this works fine, so allow immediate vblank disable for power saving. For debugging in case of unexpected trouble, booting with kernel cmdline option drm.vblankoffdelay=0 (or echo 0 > /sys/module/drm/parameters/vblankoffdelay) would keep vblank irqs permanently on to approximate old behavior. Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>