From 9e60290dbafdf577766e5fc5f2fdb3be450cf9a6 Mon Sep 17 00:00:00 2001 From: Lyude Date: Wed, 16 Mar 2016 15:18:04 -0400 Subject: drm/i915: Fix race condition in intel_dp_destroy_mst_connector() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit After unplugging a DP MST display from the system, we have to go through and destroy all of the DRM connectors associated with it since none of them are valid anymore. Unfortunately, intel_dp_destroy_mst_connector() doesn't do a good enough job of ensuring that throughout the destruction process that no modesettings can be done with the connectors. As it is right now, intel_dp_destroy_mst_connector() works like this: * Take all modeset locks * Clear the configuration of the crtc on the connector, if there is one * Drop all modeset locks, this is required because of circular dependency issues that arise with trying to remove the connector from sysfs with modeset locks held * Unregister the connector * Take all modeset locks, again * Do the rest of the required cleaning for destroying the connector * Finally drop all modeset locks for good This only works sometimes. During the destruction process, it's very possible that a userspace application will attempt to do a modesetting using the connector. When we drop the modeset locks, an ioctl handler such as drm_mode_setcrtc has the oppurtunity to take all of the modeset locks from us. When this happens, one thing leads to another and eventually we end up committing a mode with the non-existent connector: [drm:intel_dp_link_training_clock_recovery [i915]] *ERROR* failed to enable link training [drm:intel_dp_aux_ch] dp_aux_ch timeout status 0x7cf0001f [drm:intel_dp_start_link_train [i915]] *ERROR* failed to start channel equalization [drm:intel_dp_aux_ch] dp_aux_ch timeout status 0x7cf0001f [drm:intel_mst_pre_enable_dp [i915]] *ERROR* failed to allocate vcpi And in some cases, such as with the T460s using an MST dock, this results in breaking modesetting and/or panicking the system. To work around this, we now unregister the connector at the very beginning of intel_dp_destroy_mst_connector(), grab all the modesetting locks, and then hold them until we finish the rest of the function. CC: stable@vger.kernel.org Signed-off-by: Lyude Signed-off-by: Rob Clark Reviewed-by: Ville Syrjälä Signed-off-by: Daniel Vetter Link: http://patchwork.freedesktop.org/patch/msgid/1458155884-13877-1-git-send-email-cpaul@redhat.com (cherry picked from commit 1f7717552ef1306be3b7ed28c66c6eff550e3a23) Signed-off-by: Jani Nikula diff --git a/drivers/gpu/drm/i915/intel_dp_mst.c b/drivers/gpu/drm/i915/intel_dp_mst.c index a2bd698..937e772 100644 --- a/drivers/gpu/drm/i915/intel_dp_mst.c +++ b/drivers/gpu/drm/i915/intel_dp_mst.c @@ -506,6 +506,8 @@ static void intel_dp_destroy_mst_connector(struct drm_dp_mst_topology_mgr *mgr, struct intel_connector *intel_connector = to_intel_connector(connector); struct drm_device *dev = connector->dev; + intel_connector->unregister(intel_connector); + /* need to nuke the connector */ drm_modeset_lock_all(dev); if (connector->state->crtc) { @@ -519,11 +521,7 @@ static void intel_dp_destroy_mst_connector(struct drm_dp_mst_topology_mgr *mgr, WARN(ret, "Disabling mst crtc failed with %i\n", ret); } - drm_modeset_unlock_all(dev); - intel_connector->unregister(intel_connector); - - drm_modeset_lock_all(dev); intel_connector_remove_from_fbdev(intel_connector); drm_connector_cleanup(connector); drm_modeset_unlock_all(dev); -- cgit v0.10.2 From 1e8817b7f603464369b3f70895946254bf62ba8e Mon Sep 17 00:00:00 2001 From: Lyude Date: Fri, 11 Mar 2016 10:57:01 -0500 Subject: drm/i915: Call intel_dp_mst_resume() before resuming displays Since we need MST devices ready before we try to resume displays, calling this after intel_display_resume() can result in some issues with various laptop docks where the monitor won't turn back on after suspending the system. This order was originally changed in commit e7d6f7d70829 ("drm/i915: resume MST after reading back hw state") In order to fix some unclaimed register errors, however the actual cause of those has since been fixed. CC: stable@vger.kernel.org Signed-off-by: Lyude [danvet: Resolve conflicts with locking changes.] Signed-off-by: Daniel Vetter (cherry picked from commit a16b7658f4e0d4aec9bc3e75a5f0cc3f7a3a0422) Signed-off-by: Jani Nikula diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 20e8200..30798cb 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -758,10 +758,10 @@ static int i915_drm_resume(struct drm_device *dev) dev_priv->display.hpd_irq_setup(dev); spin_unlock_irq(&dev_priv->irq_lock); - intel_display_resume(dev); - intel_dp_mst_resume(dev); + intel_display_resume(dev); + /* * ... but also need to make sure that hotplug processing * doesn't cause havoc. Like in the driver load code we don't -- cgit v0.10.2 From 9dbaab56ac09f07a73fe83bf69bec3e31060080a Mon Sep 17 00:00:00 2001 From: Chris Wilson Date: Mon, 14 Mar 2016 09:01:57 +0000 Subject: drm/i915: Exit cherryview_irq_handler() after one pass MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This effectively reverts commit 8e5fd599eb219f1054e39b40d18b217af669eea9 Author: Ville Syrjälä Date: Wed Apr 9 13:28:50 2014 +0300 drm/i915/chv: Make CHV irq handler loop until all interrupts are consumed as under continuous execlists load we can saturate the IRQ handler, destablising the tsc clock and triggering the NMI watchdog to declare a hung CPU. [ 552.756051] clocksource: timekeeping watchdog on CPU0: Marking clocksource 'tsc' as unstable because the skew is too large: [ 552.756080] clocksource: 'refined-jiffies' wd_now: 10003b480 wd_last: 10003b28c mask: ffffffff [ 552.756091] clocksource: 'tsc' cs_now: d55d31aa50 cs_last: d17446166c mask: ffffffffffffffff [ 552.756210] clocksource: Switched to clocksource refined-jiffies [ 575.217870] NMI watchdog: Watchdog detected hard LOCKUP on cpu 1 [ 575.217893] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.5.0-rc7+ #18 [ 575.217905] Hardware name: /NUC5CPYB, BIOS PYBSWCEL.86A.0027.2015.0507.1758 05/07/2015 [ 575.217915] 0000000000000000 ffff88027fd05bc0 ffffffff81288c6d 0000000000000000 [ 575.217935] 0000000000000001 ffff88027fd05be0 ffffffff810e72d1 0000000000000000 [ 575.217951] ffff88027fd05c80 ffff88027fd05c20 ffffffff81114b60 0000000181015f1e [ 575.217967] Call Trace: [ 575.217973] [] dump_stack+0x4f/0x72 [ 575.217994] [] watchdog_overflow_callback+0x151/0x160 [ 575.218003] [] __perf_event_overflow+0xa0/0x1e0 [ 575.218016] [] perf_event_overflow+0x14/0x20 [ 575.218028] [] intel_pmu_handle_irq+0x1da/0x460 [ 575.218042] [] ? poll_idle+0x3e/0x70 [ 575.218052] [] ? poll_idle+0x3e/0x70 [ 575.218064] [] perf_event_nmi_handler+0x28/0x50 [ 575.218075] [] nmi_handle+0x60/0x130 [ 575.218086] [] ? poll_idle+0x3e/0x70 [ 575.218096] [] do_nmi+0x140/0x470 [ 575.218108] [] end_repeat_nmi+0x1a/0x1e [ 575.218119] [] ? poll_idle+0x3e/0x70 [ 575.218129] [] ? poll_idle+0x3e/0x70 [ 575.218139] [] ? poll_idle+0x3e/0x70 [ 575.218148] <> [] cpuidle_enter_state+0xf3/0x2f0 [ 575.218164] [] cpuidle_enter+0x17/0x20 [ 575.218175] [] call_cpuidle+0x2a/0x40 [ 575.218185] [] cpu_startup_entry+0x273/0x330 [ 575.218196] [] start_secondary+0x10e/0x130 However, not servicing all available IIR within the handler does hurt the throughput of pathological nop execbuf by about 20%, with a similar effect upon the dispatch latency of a series of execbuf. v2: use do {} while(0) for a smaller patch, and easier to revert again I have reasonable confidence that we do not miss GT interrupts (as execlists provides a stress case with a failure mechanism easily detected by igt), however I have less confidence about all the other sources of interrupts and worry that may lose a display hotplug interrupt, for example. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93467 Testcase: igt/gem_exec_nop/basic # requires NMI watchdog Signed-off-by: Chris Wilson Cc: Ville Syrjälä Cc: Antti Koskipää Cc: Tvrtko Ursulin Cc: stable@vger.kernel.org Reviewed-by: Tvrtko Ursulin Reviewed-by: Ville Syrjälä Link: http://patchwork.freedesktop.org/patch/msgid/1457946117-6714-1-git-send-email-chris@chris-wilson.co.uk (cherry picked from commit 579de73b048a0a4c66c25a033ac76a2836e0cf73) Signed-off-by: Jani Nikula diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index d1a46ef..1c21220 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -1829,7 +1829,7 @@ static irqreturn_t cherryview_irq_handler(int irq, void *arg) /* IRQs are synced during runtime_suspend, we don't require a wakeref */ disable_rpm_wakeref_asserts(dev_priv); - for (;;) { + do { master_ctl = I915_READ(GEN8_MASTER_IRQ) & ~GEN8_MASTER_IRQ_CONTROL; iir = I915_READ(VLV_IIR); @@ -1857,7 +1857,7 @@ static irqreturn_t cherryview_irq_handler(int irq, void *arg) I915_WRITE(GEN8_MASTER_IRQ, DE_MASTER_IRQ_CONTROL); POSTING_READ(GEN8_MASTER_IRQ); - } + } while (0); enable_rpm_wakeref_asserts(dev_priv); -- cgit v0.10.2 From 42bf7b46a5de03c8e2dd28a1f105bc28b6485243 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B8rn=20Mork?= Date: Wed, 30 Mar 2016 11:08:33 +0200 Subject: drm/i915: fix deadlock on lid open MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit e2c8b8701e2d moved modeset locking inside resume/suspend functions, but missed a code path only executed on lid close/open on older hardware. The result was a deadlock when closing and opening the lid without suspending on such hardware: ============================================= [ INFO: possible recursive locking detected ] 4.6.0-rc1 #385 Not tainted --------------------------------------------- kworker/0:3/88 is trying to acquire lock: (&dev->mode_config.mutex){+.+.+.}, at: [] intel_display_resume+0x4a/0x12f [i915] but task is already holding lock: (&dev->mode_config.mutex){+.+.+.}, at: [] drm_modeset_lock_all+0x3e/0xa6 [drm] other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&dev->mode_config.mutex); lock(&dev->mode_config.mutex); *** DEADLOCK *** May be due to missing lock nesting notation 7 locks held by kworker/0:3/88: #0: ("kacpi_notify"){++++.+}, at: [] process_one_work+0x14a/0x50b #1: ((&dpc->work)#2){+.+.+.}, at: [] process_one_work+0x14a/0x50b #2: ((acpi_lid_notifier).rwsem){++++.+}, at: [] __blocking_notifier_call_chain+0x34/0x65 #3: (&dev_priv->modeset_restore_lock){+.+.+.}, at: [] intel_lid_notify+0x3c/0xd9 [i915] #4: (&dev->mode_config.mutex){+.+.+.}, at: [] drm_modeset_lock_all+0x3e/0xa6 [drm] #5: (crtc_ww_class_acquire){+.+.+.}, at: [] drm_modeset_lock_all+0x48/0xa6 [drm] #6: (crtc_ww_class_mutex){+.+.+.}, at: [] modeset_lock+0x13c/0x1cd [drm] stack backtrace: CPU: 0 PID: 88 Comm: kworker/0:3 Not tainted 4.6.0-rc1 #385 Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/2011 Workqueue: kacpi_notify acpi_os_execute_deferred 0000000000000000 ffff88022fd5f990 ffffffff8124af06 ffffffff825b39c0 ffffffff825b39c0 ffff88022fd5fa60 ffffffff8108f547 ffff88022fd5fa70 000000008108e817 ffff880230236cc0 0000000000000000 ffffffff825b39c0 Call Trace: [] dump_stack+0x67/0x90 [] __lock_acquire+0xdb5/0xf71 [] ? look_up_lock_class+0xbe/0x10a [] lock_acquire+0x137/0x1cb [] ? lock_acquire+0x137/0x1cb [] ? intel_display_resume+0x4a/0x12f [i915] [] mutex_lock_nested+0x7e/0x3a4 [] ? intel_display_resume+0x4a/0x12f [i915] [] ? intel_display_resume+0x4a/0x12f [i915] [] ? modeset_lock+0x13c/0x1cd [drm] [] intel_display_resume+0x4a/0x12f [i915] [] ? intel_display_resume+0x4a/0x12f [i915] [] ? modeset_lock+0x13c/0x1cd [drm] [] ? modeset_lock+0x13c/0x1cd [drm] [] ? drm_modeset_lock+0x17/0x24 [drm] [] ? drm_modeset_lock_all_ctx+0x87/0xa1 [drm] [] intel_lid_notify+0xb0/0xd9 [i915] [] notifier_call_chain+0x4a/0x6c [] __blocking_notifier_call_chain+0x4d/0x65 [] blocking_notifier_call_chain+0x14/0x16 [] acpi_lid_send_state+0x83/0xad [button] [] acpi_button_notify+0x41/0x132 [button] [] acpi_device_notify+0x19/0x1b [] acpi_ev_notify_dispatch+0x49/0x64 [] acpi_os_execute_deferred+0x14/0x20 [] process_one_work+0x265/0x50b [] worker_thread+0x1fc/0x2dd [] ? rescuer_thread+0x309/0x309 [] ? rescuer_thread+0x309/0x309 [] kthread+0xe0/0xe8 [] ? local_clock+0x19/0x22 [] ret_from_fork+0x22/0x40 [] ? kthread_create_on_node+0x1b5/0x1b5 Fixes: e2c8b8701e2d ("drm/i915: Use atomic helpers for suspend, v2.") Cc: Maarten Lankhorst Signed-off-by: Bjørn Mork Signed-off-by: Maarten Lankhorst Link: http://patchwork.freedesktop.org/patch/msgid/1459328913-13719-1-git-send-email-bjorn@mork.no (cherry picked from commit 9f54d4bd5808b5c892a44c539c126b71d299f341) Signed-off-by: Jani Nikula diff --git a/drivers/gpu/drm/i915/intel_lvds.c b/drivers/gpu/drm/i915/intel_lvds.c index 30a8403..cd9fe60 100644 --- a/drivers/gpu/drm/i915/intel_lvds.c +++ b/drivers/gpu/drm/i915/intel_lvds.c @@ -478,11 +478,8 @@ static int intel_lid_notify(struct notifier_block *nb, unsigned long val, * and as part of the cleanup in the hw state restore we also redisable * the vga plane. */ - if (!HAS_PCH_SPLIT(dev)) { - drm_modeset_lock_all(dev); + if (!HAS_PCH_SPLIT(dev)) intel_display_resume(dev); - drm_modeset_unlock_all(dev); - } dev_priv->modeset_restore = MODESET_DONE; -- cgit v0.10.2