linux - NXP/Freescale LSDK linux tree with Scalys patches

Age	Commit message (Collapse)	Author
2017-01-09	scsi: avoid a permanent stop of the scsi device's request queue	Wei Fang
	commit d2a145252c52792bc59e4767b486b26c430af4bb upstream. A race between scanning and fc_remote_port_delete() may result in a permanent stop if the device gets blocked before scsi_sysfs_add_sdev() and unblocked after. The reason is that blocking a device sets both the SDEV_BLOCKED state and the QUEUE_FLAG_STOPPED. However, scsi_sysfs_add_sdev() unconditionally sets SDEV_RUNNING which causes the device to be ignored by scsi_target_unblock() and thus never have its QUEUE_FLAG_STOPPED cleared leading to a device which is apparently running but has a stopped queue. We actually have two places where SDEV_RUNNING is set: once in scsi_add_lun() which respects the blocked flag and once in scsi_sysfs_add_sdev() which doesn't. Since the second set is entirely spurious, simply remove it to fix the problem. Reported-by: Zengxi Chen <chenzengxi@huawei.com> Signed-off-by: Wei Fang <fangwei1@huawei.com> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09	scsi: zfcp: fix rport unblock race with LUN recovery	Steffen Maier
	commit 6f2ce1c6af37191640ee3ff6e8fc39ea10352f4c upstream. It is unavoidable that zfcp_scsi_queuecommand() has to finish requests with DID_IMM_RETRY (like fc_remote_port_chkready()) during the time window when zfcp detected an unavailable rport but fc_remote_port_delete(), which is asynchronous via zfcp_scsi_schedule_rport_block(), has not yet blocked the rport. However, for the case when the rport becomes available again, we should prevent unblocking the rport too early. In contrast to other FCP LLDDs, zfcp has to open each LUN with the FCP channel hardware before it can send I/O to a LUN. So if a port already has LUNs attached and we unblock the rport just after port recovery, recoveries of LUNs behind this port can still be pending which in turn force zfcp_scsi_queuecommand() to unnecessarily finish requests with DID_IMM_RETRY. This also opens a time window with unblocked rport (until the followup LUN reopen recovery has finished). If a scsi_cmnd timeout occurs during this time window fc_timed_out() cannot work as desired and such command would indeed time out and trigger scsi_eh. This prevents a clean and timely path failover. This should not happen if the path issue can be recovered on FC transport layer such as path issues involving RSCNs. Fix this by only calling zfcp_scsi_schedule_rport_register(), to asynchronously trigger fc_remote_port_add(), after all LUN recoveries as children of the rport have finished and no new recoveries of equal or higher order were triggered meanwhile. Finished intentionally includes any recovery result no matter if successful or failed (still unblock rport so other successful LUNs work). For simplicity, we check after each finished LUN recovery if there is another LUN recovery pending on the same port and then do nothing. We handle the special case of a successful recovery of a port without LUN children the same way without changing this case's semantics. For debugging we introduce 2 new trace records written if the rport unblock attempt was aborted due to still unfinished or freshly triggered recovery. The records are only written above the default trace level. Benjamin noticed the important special case of new recovery that can be triggered between having given up the erp_lock and before calling zfcp_erp_action_cleanup() within zfcp_erp_strategy(). We must avoid the following sequence: ERP thread rport_work other context ------------------------- -------------- -------------------------------- port is unblocked, rport still blocked, due to pending/running ERP action, so ((port->status & ...UNBLOCK) != 0) and (port->rport == NULL) unlock ERP zfcp_erp_action_cleanup() case ZFCP_ERP_ACTION_REOPEN_LUN: zfcp_erp_try_rport_unblock() ((status & ...UNBLOCK) != 0) [OLD!] zfcp_erp_port_reopen() lock ERP zfcp_erp_port_block() port->status clear ...UNBLOCK unlock ERP zfcp_scsi_schedule_rport_block() port->rport_task = RPORT_DEL queue_work(rport_work) zfcp_scsi_rport_work() (port->rport_task != RPORT_ADD) port->rport_task = RPORT_NONE zfcp_scsi_rport_block() if (!port->rport) return zfcp_scsi_schedule_rport_register() port->rport_task = RPORT_ADD queue_work(rport_work) zfcp_scsi_rport_work() (port->rport_task == RPORT_ADD) port->rport_task = RPORT_NONE zfcp_scsi_rport_register() (port->rport == NULL) rport = fc_remote_port_add() port->rport = rport; Now the rport was erroneously unblocked while the zfcp_port is blocked. This is another situation we want to avoid due to scsi_eh potential. This state would at least remain until the new recovery from the other context finished successfully, or potentially forever if it failed. In order to close this race, we take the erp_lock inside zfcp_erp_try_rport_unblock() when checking the status of zfcp_port or LUN. With that, the possible corresponding rport state sequences would be: (unblock[ERP thread],block[other context]) if the ERP thread gets erp_lock first and still sees ((port->status & ...UNBLOCK) != 0), (block[other context],NOP[ERP thread]) if the ERP thread gets erp_lock after the other context has already cleard ...UNBLOCK from port->status. Since checking fields of struct erp_action is unsafe because they could have been overwritten (re-used for new recovery) meanwhile, we only check status of zfcp_port and LUN since these are only changed under erp_lock elsewhere. Regarding the check of the proper status flags (port or port_forced are similar to the shown adapter recovery): [zfcp_erp_adapter_shutdown()] zfcp_erp_adapter_reopen() zfcp_erp_adapter_block() * clear UNBLOCK ---------------------------------------+ zfcp_scsi_schedule_rports_block() \| write_lock_irqsave(&adapter->erp_lock, flags);-------+ \| zfcp_erp_action_enqueue() \| \| zfcp_erp_setup_act() \| \| * set ERP_INUSE -----------------------------------\|--\|--+ write_unlock_irqrestore(&adapter->erp_lock, flags);--+ \| \| .context-switch. \| \| zfcp_erp_thread() \| \| zfcp_erp_strategy() \| \| write_lock_irqsave(&adapter->erp_lock, flags);------+ \| \| ... \| \| \| zfcp_erp_strategy_check_target() \| \| \| zfcp_erp_strategy_check_adapter() \| \| \| zfcp_erp_adapter_unblock() \| \| \| * set UNBLOCK -----------------------------------\|--+ \| zfcp_erp_action_dequeue() \| \| * clear ERP_INUSE ---------------------------------\|-----+ ... \| write_unlock_irqrestore(&adapter->erp_lock, flags);-+ Hence, we should check for both UNBLOCK and ERP_INUSE because they are interleaved. Also we need to explicitly check ERP_FAILED for the link down case which currently does not clear the UNBLOCK flag in zfcp_fsf_link_down_info_eval(). Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 8830271c4819 ("[SCSI] zfcp: Dont fail SCSI commands when transitioning to blocked fc_rport") Fixes: a2fa0aede07c ("[SCSI] zfcp: Block FC transport rports early on errors") Fixes: 5f852be9e11d ("[SCSI] zfcp: Fix deadlock between zfcp ERP and SCSI") Fixes: 338151e06608 ("[SCSI] zfcp: make use of fc_remote_port_delete when target port is unavailable") Fixes: 3859f6a248cb ("[PATCH] zfcp: add rports to enable scsi_add_device to work again") Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09	scsi: zfcp: do not trace pure benign residual HBA responses at default level	Steffen Maier
	commit 56d23ed7adf3974f10e91b643bd230e9c65b5f79 upstream. Since quite a while, Linux issues enough SCSI commands per scsi_device which successfully return with FCP_RESID_UNDER, FSF_FCP_RSP_AVAILABLE, and SAM_STAT_GOOD. This floods the HBA trace area and we cannot see other and important HBA trace records long enough. Therefore, do not trace HBA response errors for pure benign residual under counts at the default trace level. This excludes benign residual under count combined with other validity bits set in FCP_RSP_IU, such as FCP_SNS_LEN_VAL. For all those other cases, we still do want to see both the HBA record and the corresponding SCSI record by default. Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: a54ca0f62f95 ("[SCSI] zfcp: Redesign of the debug tracing for HBA records.") Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09	scsi: zfcp: fix use-after-"free" in FC ingress path after TMF	Benjamin Block
	commit dac37e15b7d511e026a9313c8c46794c144103cd upstream. When SCSI EH invokes zFCP's callbacks for eh_device_reset_handler() and eh_target_reset_handler(), it expects us to relent the ownership over the given scsi_cmnd and all other scsi_cmnds within the same scope - LUN or target - when returning with SUCCESS from the callback ('release' them). SCSI EH can then reuse those commands. We did not follow this rule to release commands upon SUCCESS; and if later a reply arrived for one of those supposed to be released commands, we would still make use of the scsi_cmnd in our ingress tasklet. This will at least result in undefined behavior or a kernel panic because of a wrong kernel pointer dereference. To fix this, we NULLify all pointers to scsi_cmnds (struct zfcp_fsf_req )->data in the matching scope if a TMF was successful. This is done under the locks (struct zfcp_adapter )->abort_lock and (struct zfcp_reqlist *)->lock to prevent the requests from being removed from the request-hashtable, and the ingress tasklet from making use of the scsi_cmnd-pointer in zfcp_fsf_fcp_cmnd_handler(). For cases where a reply arrives during SCSI EH, but before we get a chance to NULLify the pointer - but before we return from the callback -, we assume that the code is protected from races via the CAS operation in blk_complete_request() that is called in scsi_done(). The following stacktrace shows an example for a crash resulting from the previous behavior: Unable to handle kernel pointer dereference at virtual kernel address fffffee17a672000 Oops: 0038 [#1] SMP CPU: 2 PID: 0 Comm: swapper/2 Not tainted task: 00000003f7ff5be0 ti: 00000003f3d38000 task.ti: 00000003f3d38000 Krnl PSW : 0404d00180000000 00000000001156b0 (smp_vcpu_scheduled+0x18/0x40) R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3 Krnl GPRS: 000000200000007e 0000000000000000 fffffee17a671fd8 0000000300000015 ffffffff80000000 00000000005dfde8 07000003f7f80e00 000000004fa4e800 000000036ce8d8f8 000000036ce8d9c0 00000003ece8fe00 ffffffff969c9e93 00000003fffffffd 000000036ce8da10 00000000003bf134 00000003f3b07918 Krnl Code: 00000000001156a2: a7190000 lghi %r1,0 00000000001156a6: a7380015 lhi %r3,21 #00000000001156aa: e32050000008 ag %r2,0(%r5) >00000000001156b0: 482022b0 lh %r2,688(%r2) 00000000001156b4: ae123000 sigp %r1,%r2,0(%r3) 00000000001156b8: b2220020 ipm %r2 00000000001156bc: 8820001c srl %r2,28 00000000001156c0: c02700000001 xilf %r2,1 Call Trace: ([<0000000000000000>] 0x0) [<000003ff807bdb8e>] zfcp_fsf_fcp_cmnd_handler+0x3de/0x490 [zfcp] [<000003ff807be30a>] zfcp_fsf_req_complete+0x252/0x800 [zfcp] [<000003ff807c0a48>] zfcp_fsf_reqid_check+0xe8/0x190 [zfcp] [<000003ff807c194e>] zfcp_qdio_int_resp+0x66/0x188 [zfcp] [<000003ff80440c64>] qdio_kick_handler+0xdc/0x310 [qdio] [<000003ff804463d0>] __tiqdio_inbound_processing+0xf8/0xcd8 [qdio] [<0000000000141fd4>] tasklet_action+0x9c/0x170 [<0000000000141550>] __do_softirq+0xe8/0x258 [<000000000010ce0a>] do_softirq+0xba/0xc0 [<000000000014187c>] irq_exit+0xc4/0xe8 [<000000000046b526>] do_IRQ+0x146/0x1d8 [<00000000005d6a3c>] io_return+0x0/0x8 [<00000000005d6422>] vtime_stop_cpu+0x4a/0xa0 ([<0000000000000000>] 0x0) [<0000000000103d8a>] arch_cpu_idle+0xa2/0xb0 [<0000000000197f94>] cpu_startup_entry+0x13c/0x1f8 [<0000000000114782>] smp_start_secondary+0xda/0xe8 [<00000000005d6efe>] restart_int_handler+0x56/0x6c [<0000000000000000>] 0x0 Last Breaking-Event-Address: [<00000000003bf12e>] arch_spin_lock_wait+0x56/0xb0 Suggested-by: Steffen Maier <maier@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Fixes: ea127f9754 ("[PATCH] s390 (7/7): zfcp host adapter.") (tglx/history.git) Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09	iscsi-target: Return error if unable to add network portal	Varun Prakash
	commit 83337e544323a8bd7492994d64af339175ac7107 upstream. If iscsit_tpg_add_network_portal() fails then return error code instead of 0 to user space. If iscsi-target returns 0 then user space keeps on retrying same command infinitely, targetcli or echo hangs till command completes with non zero return value. In some cases it is possible that add network portal command never completes with success even after retrying multiple times, for example - cxgbit_setup_np() always returns -EINVAL if portal IP does not belong to Chelsio adapter interface. Signed-off-by: Varun Prakash <varun@chelsio.com> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> [ bvanassche: Added "Fixes:" and "Cc: stable" tags ] Fixes: commit d4b3fa4b0881 ("iscsi-target: Make iscsi_tpg_np driver show/store use generic code") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09	scsi: megaraid_sas: Do not set MPI2_TYPE_CUDA for JBOD FP path for FW which ↵	Kashyap Desai
	does not support JBOD sequence map commit d5573584429254a14708cf8375c47092b5edaf2c upstream. Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09	scsi: megaraid_sas: For SRIOV enabled firmware, ensure VF driver waits for ↵	Kashyap Desai
	30secs before reset commit 18e1c7f68a5814442abad849abe6eacbf02ffd7c upstream. For SRIOV enabled firmware, if there is a OCR(online controller reset) possibility driver set the convert flag to 1, which is not happening if there are outstanding commands even after 180 seconds. As driver does not set convert flag to 1 and still making the OCR to run, VF(Virtual function) driver is directly writing on to the register instead of waiting for 30 seconds. Setting convert flag to 1 will cause VF driver will wait for 30 secs before going for reset. Signed-off-by: Kiran Kumar Kasturi <kiran-kumar.kasturi@broadcom.com> Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09	stm class: Fix device leak in open error path	Johan Hovold
	commit a0ebf519b8a2666438d999c62995618c710573e5 upstream. Make sure to drop the reference taken by class_find_device() also on allocation errors in open(). Signed-off-by: Johan Hovold <johan@kernel.org> Fixes: 7bd1d4093c2f ("stm class: Introduce an abstraction for...") Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09	vt: fix Scroll Lock LED trigger name	Maciej S. Szmigiero
	commit 31b5929d533f5183972cf57a7844b456ed996f3c upstream. There is a disagreement between drivers/tty/vt/keyboard.c and drivers/input/input-leds.c with regard to what is a Scroll Lock LED trigger name: input calls it "kbd-scrolllock", but vt calls it "kbd-scrollock" (two l's). This prevents Scroll Lock LED trigger from binding to this LED by default. Since it is a scroLL Lock LED, this interface was introduced only about a year ago and in an Internet search people seem to reference this trigger only to set it to this LED let's simply rename it to "kbd-scrolllock". Also, it looks like this was supposed to be changed before this code was merged: https://lkml.org/lkml/2015/6/9/697 but it was done only on the input side. Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name> Acked-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09	block: protect iterate_bdevs() against concurrent close	Rabin Vincent
	commit af309226db916e2c6e08d3eba3fa5c34225200c4 upstream. If a block device is closed while iterate_bdevs() is handling it, the following NULL pointer dereference occurs because bdev->b_disk is NULL in bdev_get_queue(), which is called from blk_get_backing_dev_info() (in turn called by the mapping_cap_writeback_dirty() call in __filemap_fdatawrite_range()): BUG: unable to handle kernel NULL pointer dereference at 0000000000000508 IP: [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20 PGD 9e62067 PUD 9ee8067 PMD 0 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: CPU: 1 PID: 2422 Comm: sync Not tainted 4.5.0-rc7+ #400 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) task: ffff880009f4d700 ti: ffff880009f5c000 task.ti: ffff880009f5c000 RIP: 0010:[<ffffffff81314790>] [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20 RSP: 0018:ffff880009f5fe68 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88000ec17a38 RCX: ffffffff81a4e940 RDX: 7fffffffffffffff RSI: 0000000000000000 RDI: ffff88000ec176c0 RBP: ffff880009f5fe68 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: ffff88000ec17860 R13: ffffffff811b25c0 R14: ffff88000ec178e0 R15: ffff88000ec17a38 FS: 00007faee505d700(0000) GS:ffff88000fb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000508 CR3: 0000000009e8a000 CR4: 00000000000006e0 Stack: ffff880009f5feb8 ffffffff8112e7f5 0000000000000000 7fffffffffffffff 0000000000000000 0000000000000000 7fffffffffffffff 0000000000000001 ffff88000ec178e0 ffff88000ec17860 ffff880009f5fec8 ffffffff8112e81f Call Trace: [<ffffffff8112e7f5>] __filemap_fdatawrite_range+0x85/0x90 [<ffffffff8112e81f>] filemap_fdatawrite+0x1f/0x30 [<ffffffff811b25d6>] fdatawrite_one_bdev+0x16/0x20 [<ffffffff811bc402>] iterate_bdevs+0xf2/0x130 [<ffffffff811b2763>] sys_sync+0x63/0x90 [<ffffffff815d4272>] entry_SYSCALL_64_fastpath+0x12/0x76 Code: 0f 1f 44 00 00 48 8b 87 f0 00 00 00 55 48 89 e5 <48> 8b 80 08 05 00 00 5d RIP [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20 RSP <ffff880009f5fe68> CR2: 0000000000000508 ---[ end trace 2487336ceb3de62d ]--- The crash is easily reproducible by running the following command, if an msleep(100) is inserted before the call to func() in iterate_devs(): while :; do head -c1 /dev/nullb0; done > /dev/null & while :; do sync; done Fix it by holding the bd_mutex across the func() call and only calling func() if the bdev is opened. Fixes: 5c0d6b60a0ba ("vfs: Create function for iterating over block devices") Reported-and-tested-by: Wei Fang <fangwei1@huawei.com> Signed-off-by: Rabin Vincent <rabinv@axis.com> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09	mei: me: add lewisburg device ids	Tomas Winkler
	commit 9ff2007bea1f1bfc53ac0bc7ccf8200bb275fd52 upstream. Add MEI Lewisburg PCH IDs for Purley based workstations. Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09	mei: request async autosuspend at the end of enumeration	Alexander Usyskin
	commit d5f8e166c25750adc147b0adf64a62a91653438a upstream. pm_runtime_autosuspend can take synchronous or asynchronous paths, Because we are calling pm_runtime_mark_last_busy just before this most of the cases it takes the asynchronous way. However, when the FW or driver resets during already running runtime suspend, the call will result in calling to the driver's rpm callback and results in a deadlock on device_lock. The simplest fix is to replace pm_runtime_autosuspend with asynchronous pm_request_autosuspend. Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09	drivers/gpu/drm/ast: Fix infinite loop if read fails	Russell Currey
	commit 298360af3dab45659810fdc51aba0c9f4097e4f6 upstream. ast_get_dram_info() configures a window in order to access BMC memory. A BMC register can be configured to disallow this, and if so, causes an infinite loop in the ast driver which renders the system unusable. Fix this by erroring out if an error is detected. On powerpc systems with EEH, this leads to the device being fenced and the system continuing to operate. Signed-off-by: Russell Currey <ruscur@russell.cc> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20161215051241.20815-1-ruscur@russell.cc Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09	drm/amdgpu: fix init save/restore list in gfx_v8.0	Rex Zhu
	commit 202e0b227b906cb80a2791f21216a55d9468d61b upstream. set valid data to mmRLC_SRM_INDEX_CNTL_ADDRx/DATAx. Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09	drm/amdgpu: fix enable_cp_power_gating in gfx_v8.0.	Rex Zhu
	commit eb584241226958d45aa1f07f4f6a6ea9da98b29e upstream. the CP_PG_DISABLE bit was reversed. Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09	drm/amd/powerplay: bypass fan table setup if no fan connected	Hawking Zhang
	commit 10e2ca346bf74561ff1b7fff6287716ab976cd8c upstream. If vBIOS noFan bit is set, the fan table parameters in thermal controller will not get initialized. The driver should avoid to use these uninitialized parameter to do calculation. Otherwise, it may trigger divide 0 error. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09	drm/gma500: Add compat ioctl	Patrik Jakobsson
	commit 0a97c81a9717431e6c57ea845b59c3c345edce67 upstream. Hook up drm_compat_ioctl to support 32-bit userspace on 64-bit kernels. It turns out that N2600 and N2800 comes with 64-bit enabled. We previously assumed there where no such systems out there. Signed-off-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com> Signed-off-by: Sean Paul <seanpaul@chromium.org> Link: http://patchwork.freedesktop.org/patch/msgid/20161101144315.2955-1-patrik.r.jakobsson@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09	drm/radeon/si: load the proper firmware on 0x87 oland boards	Alex Deucher
	commit abb2e3c1ce64c8bba678973800c34ea1dc97c42c upstream. New variant. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09	drm/radeon: add additional pci revision to dpm workaround	Alex Deucher
	commit 8729675c00a8d13cb2094d617d70a4a4da7d83c5 upstream. New variant. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09	drm/radeon: Hide the HW cursor while it's out of bounds	Michel Dänzer
	commit 6b16cf7785a4200b1bddf4f70c9dda2efc49e278 upstream. Fixes hangs in that case under some circumstances. v2: * Only use non-0 x/yorigin if the cursor is (partially) outside of the top/left edge of the total surface with AVIVO/DCE Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1000433 Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> (v1) Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09	drm/radeon: Also call cursor_move_locked when the cursor size changes	Michel Dänzer
	commit dcab0fa64e300afa18f39cd98d05e0950f652adf upstream. The cursor size also affects the register programming. Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09	drm/nouveau/fifo/gf100-: protect channel preempt with subdev mutex	Ben Skeggs
	commit b27add13f500469127afdf011dbcc9c649e16e54 upstream. This avoids an issue that occurs when we're attempting to preempt multiple channels simultaneously. HW seems to ignore preempt requests while it's still processing a previous one, which, well, makes sense. Fixes random "fifo: SCHED_ERROR 0d []" + GPCCS page faults during parallel piglit runs on (at least) GM107. Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09	drm/nouveau/i2c/gk110b,gm10x: use the correct implementation	Ben Skeggs
	commit 5b3800a6b763874e4a23702fb9628d3bd3315ce9 upstream. DPAUX registers moved on Kepler, these chipsets were still using the Fermi implementation for some reason. This fixes detection of hotplug/sink IRQs on DP connectors. Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09	drm/nouveau/ttm: wait for bo fence to signal before unmapping vmas	Ben Skeggs
	commit 10dcab3e7f477bffee88d518aad57d06777cfdf4 upstream. TTM was changed a while back to allow for pipelining of buffer moves, and part of this was the removal of waiting for a BO to idle before calling move(), placing the responsibility on the driver to do this if required. That's all well and good, except, we make use of move_notify() to handle mapping/unmapping from the GPU VMM as move() isn't called on all paths. This commit adds a wait before unmapping from a VMM in move_notify(), to prevent GPU page faults where a buffer is still being accessed. Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09	drm/nouveau/ltc: protect clearing of comptags with mutex	Ben Skeggs
	commit f4e65efc88b64c1dbca275d42a188edccedb56c6 upstream. Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09	drm/nouveau/bios: require checksum to match for fast acpi shadow method	Ben Skeggs
	commit 5dc7f4aa9d84ea94b54a9bfcef095f0289f1ebda upstream. Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09	drm/nouveau/kms: lvds panel strap moved again on maxwell	Ben Skeggs
	commit 768e847759d551c96e129e194588dbfb11a1d576 upstream. Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09	drm/nouveau/gr: fallback to legacy paths during firmware lookup	Alexandre Courbot
	commit e137040e0d0376b404fc5155eba44ea07126e3bd upstream. Look for firmware files using the legacy ("nouveau/nvxx_fucxxxx") path if they cannot be found in the new, "official" path. User setups were broken by the switch, which is bad. There are only 4 firmware files we may want to look up that way, so hardcode them into the lookup function. All new firmware files should use the standard "nvidia/<chip>/gr/" path. Fixes: 8539b37acef7 ("drm/nouveau/gr: use NVIDIA-provided external firmwares") Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09	drm/amd/amdgpu: enable GUI idle INT after enabling CGCG	Arindam Nath
	commit dd31ae9ac933636c3712b7dd0f6152c1d71f81fe upstream. GUI idle interrupts should be enabled only after we have enabled coarse grain clock gating (CGCG). This prevents GFX engine generating idle interrupt even though CGCG is not completely enabled. Most of the time this goes un-noticed, but on some Stoney ASICs this results in GFX engine hang after system resumes from suspend. The issue is not particular to Stoney though and could have occured on any ASIC. The patch fixes this issue. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reported-by: Sunil Uttarwar <Sunil.Uttarwar1@amd.com> Signed-off-by: Arindam Nath <arindam.nath@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09	drm/amdgpu: Also call cursor_move_locked when the cursor size changes	Michel Dänzer
	commit 8b02cde994e3025b6886c82eac6cd1e7bc4d1fe9 upstream. The cursor size also affects the register programming. Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09	drm/amdgpu: Store CRTC relative amdgpu_crtc->cursor_x/y values	Michel Dänzer
	commit 8e57ec613df7d6bfa8ffe7512290c5415ebb8657 upstream. We were storing viewport relative coordinates. However, crtc_cursor_set2 and cursor_reset pass amdgpu_crtc->cursor_x/y as the x/y parameters of cursor_move_locked, which would break if the CRTC isn't located at (0, 0). Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09	drm/amdgpu: add additional pci revision to dpm workaround	Alex Deucher
	commit ce66cb1e9cbf91fcb216de64a0fe65aa17f97bc1 upstream. New variant. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09	drm/amdgpu/si: load the proper firmware on 0x87 oland boards	Alex Deucher
	commit 5a23f2720589ec4757bc62183902d2518f02026e upstream. New variant. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09	ACPI / video: Add force_native quirk for HP Pavilion dv6	Hans de Goede
	commit 6276e53fa8c06a3a5cf7b95b77b079966de9ad66 upstream. The HP Pavilion dv6 has a non-working acpi_video0 backlight interface and an intel_backlight interface which works fine. Add a force_native quirk for it so that the non-working acpi_video0 interface does not get registered. Note that there are quite a few HP Pavilion dv6 variants, some woth ATI and some with NVIDIA hybrid gfx, both seem to need this quirk to have working backlight control. There are also some versions with only Intel integrated gfx, these may not need this quirk, but it should not hurt there. Link: https://bugzilla.redhat.com/show_bug.cgi?id=1204476 Link: https://bugs.launchpad.net/ubuntu/+source/linux-lts-trusty/+bug/1416940 Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09	ACPI / video: Add force_native quirk for Dell XPS 17 L702X	Hans de Goede
	commit 350fa038c31b056fc509624efb66348ac2c1e3d0 upstream. The Dell XPS 17 L702X has a non-working acpi_video0 backlight interface and an intel_backlight interface which works fine. Add a force_native quirk for it so that the non-working acpi_video0 interface does not get registered. Note that there also is an issue with the brightnesskeys on this laptop, they do not generate key-press events in anyway. That is not solved by this patch. Link: https://bugzilla.redhat.com/show_bug.cgi?id=1123661 Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09	staging: comedi: ni_mio_common: fix E series ni_ai_insn_read() data	Ian Abbott
	commit 857a661020a2de3a0304edf33ad656abee100891 upstream. Commit 0557344e2149 ("staging: comedi: ni_mio_common: fix local var for 32-bit read") changed the type of local variable `d` from `unsigned short` to `unsigned int` to fix a bug introduced in commit 9c340ac934db ("staging: comedi: ni_stc.h: add read/write callbacks to struct ni_private") when reading AI data for NI PCI-6110 and PCI-6111 cards. Unfortunately, other parts of the function rely on the variable being `unsigned short` when an offset value in local variable `signbits` is added to `d` before writing the value to the `data` array: d += signbits; data[n] = d; The `signbits` variable will be non-zero in bipolar mode, and is used to convert the hardware's 2's complement, 16-bit numbers to Comedi's straight binary sample format (with 0 representing the most negative voltage). This breaks because `d` is now 32 bits wide instead of 16 bits wide, so after the addition of `signbits`, `data[n]` ends up being set to values above 65536 for negative voltages. This affects all supported "E series" cards except PCI-6143 (and PXI-6143). Fix it by ANDing the value written to the `data[n]` with the mask 0xffff. Fixes: 0557344e2149 ("staging: comedi: ni_mio_common: fix local var for 32-bit read") Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09	staging: comedi: ni_mio_common: fix M Series ni_ai_insn_read() data mask	Ian Abbott
	commit 655c4d442d1213b617926cc6d54e2a9a793fb46b upstream. For NI M Series cards, the Comedi `insn_read` handler for the AI subdevice is broken due to ANDing the value read from the AI FIFO data register with an incorrect mask. The incorrect mask clears all but the most significant bit of the sample data. It should preserve all the sample data bits. Correct it. Fixes: 817144ae7fda ("staging: comedi: ni_mio_common: remove unnecessary use of 'board->adbits'") Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09	staging: lustre: ldlm: pl_recalc time handling is wrong	Arnd Bergmann
	commit b8cb86fd95bb461c3496e1f4b4083b198c963a9c upstream. James Simmons reports: > The ldlm_pool field pl_recalc_time is set to the current > monotonic clock value but the interval period is calculated > with the wall clock. This means the interval period will > always be far larger than the pl_recalc_period, which is > just a small interval time period. The correct thing to > do is to use monotomic clock current value instead of the > wall clocks value when calculating recalc_interval_sec. This broke when I converted the 32-bit get_seconds() into ktime_get_{real_,}seconds() inconsistently. Either one of those two would have worked, but mixing them does not. Staying with the original intention of the patch, this changes the ktime_get_seconds() calls into ktime_get_real_seconds(), using real time instead of mononic time. Fixes: 8f83409cf238 ("staging/lustre: use 64-bit time for pl_recalc") Reported-by: James Simmons <jsimmons@infradead.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: James Simmons <jsimmons@infradead.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09	staging/lustre/osc: Revert erroneous list_for_each_entry_safe use	Oleg Drokin
	commit cd15dd6ef4ea11df87f717b8b1b83aaa738ec8af upstream. I have been having a lot of unexplainable crashes in osc_lru_shrink lately that I could not see a good explanation for and then I found this patch that slip under the radar somehow that incorrectly converted while loop for lru list iteration into list_for_each_entry_safe totally ignoring that in the body of the loop we drop spinlocks guarding this list and move list entries around. Not sure why it was not showing up right away, perhaps some of the more recent LRU changes committed caused some extra pressure on this code that finally highlighted the breakage. Reverts: 8adddc36b1fc ("staging: lustre: osc: Use list_for_each_entry_safe") CC: Bhaktipriya Shridhar <bhaktipriya96@gmail.com> Signed-off-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09	hv: acquire vmbus_connection.channel_mutex in vmbus_free_channels()	Vitaly Kuznetsov
	commit abd1026da4a7700a8db370947f75cd17b6ae6f76 upstream. "kernel BUG at drivers/hv/channel_mgmt.c:350!" is observed when hv_vmbus module is unloaded. BUG_ON() was introduced in commit 85d9aa705184 ("Drivers: hv: vmbus: add an API vmbus_hvsock_device_unregister()") as vmbus_free_channels() codepath was apparently forgotten. Fixes: 85d9aa705184 ("Drivers: hv: vmbus: add an API vmbus_hvsock_device_unregister()") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09	docs: sphinx-extensions: make rstFlatTable work with docutils 0.13	Dmitry Shachnev
	commit 217e2bfab22e740227df09f22165e834cddd8a3b upstream. In docutils 0.13, the return type of get_column_widths method of the Table directive has changed [1], which breaks our flat-table directive and leads to a TypeError when trying to build the docs [2]. This patch adds support for the new return type, while keeping support for older docutils versions too. [1] https://sourceforge.net/p/docutils/patches/120/ [2] https://sourceforge.net/p/docutils/bugs/303/ Signed-off-by: Dmitry Shachnev <mitya57@debian.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09	thermal: hwmon: Properly report critical temperature in sysfs	Krzysztof Kozlowski
	commit f37fabb8643eaf8e3b613333a72f683770c85eca upstream. In the critical sysfs entry the thermal hwmon was returning wrong temperature to the user-space. It was reporting the temperature of the first trip point instead of the temperature of critical trip point. For example: /sys/class/hwmon/hwmon0/temp1_crit:50000 /sys/class/thermal/thermal_zone0/trip_point_0_temp:50000 /sys/class/thermal/thermal_zone0/trip_point_0_type:active /sys/class/thermal/thermal_zone0/trip_point_3_temp:120000 /sys/class/thermal/thermal_zone0/trip_point_3_type:critical Since commit e68b16abd91d ("thermal: add hwmon sysfs I/F") the driver have been registering a sysfs entry if get_crit_temp() callback was provided. However when accessed, it was calling get_trip_temp() instead of the get_crit_temp(). Fixes: e68b16abd91d ("thermal: add hwmon sysfs I/F") Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09	clk: bcm2835: Avoid overwriting the div info when disabling a pll_div clk	Boris Brezillon
	commit 68af4fa8f39b542a6cde7ac19518d88e9b3099dc upstream. bcm2835_pll_divider_off() is resetting the divider field in the A2W reg to zero when disabling the clock. Make sure we preserve this value by reading the previous a2w_reg value first and ORing the result with A2W_PLL_CHANNEL_DISABLE. Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Fixes: 41691b8862e2 ("clk: bcm2835: Add support for programming the audio domain clocks") Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09	arm64: tegra: Add VDD_GPU regulator to Jetson TX1	Alexandre Courbot
	commit 5e6b9a89afceadb1ee45472098f7d20af260335c upstream. Add the VDD_GPU regulator (a GPIO-enabled PWM regulator) to the Jetson TX1 board. This addition allows the GPU to be used provided the bootloader properly enabled the GPU node. Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com> [as pointed out by Thierry on IRC, nobody has reported a bug in the field, but using a new bootloader with a .dtb that has the incorrect data, it will crash on boot] Fixes: 336f79c7b6d7 ("arm64: tegra: Add NVIDIA Jetson TX1 Developer Kit support") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09	gpio: chardev: Return error for seek operations	Lars-Peter Clausen
	commit f4e81c529767b9a33d1b27695c54dc84a14af30d upstream. The GPIO chardev is used for management tasks (allocating line and event handles) and does neither support read() nor write() operations. Hence it does not make much sense to allow seek operations. Currently the chardev uses noop_llseek() for its seek implementation. This function does not move the pointer and simply returns the current position (always 0 for the GPIO chardev). noop_llseek() is primarily meant for devices that can not support seek, but where there might be a user that depends on the seek() operation succeeding. For newly added devices that can not support seek operations it is recommended to use no_llseek(), which will return an error. For more information see commit 6038f373a3dc ("llseek: automatically add .llseek fop"). Unfortunately this was overlooked when the GPIO chardev ABI was introduced. But it is highly unlikely that since then userspace applications have appeared that rely on being able to perform non-failing seek operations on a GPIO chardev file descriptor. So it should be safe to change from noop_llseel() to no_seek(). Also use nonseekable_open() in the chardev open() callback to clear the FMODE_SEEK, FMODE_PREAD and FMODE_PWRITE flags from the file. Neither of these should be set on a file that does not support seek operations. Fixes: 3c702e9987e2 ("gpio: add a userspace chardev ABI for GPIOs") Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09	gpio: stmpe: fix interrupt handling bug	Linus Walleij
	commit 1516c6350aa2770b8a5e36d40c3ec5078f92ba70 upstream. commit 43db289d00c6 ("gpio: stmpe: Rework registers access") reworked the STMPE register access so as to use [STMPE_IDX__LSB + i] to access the 8bit register for a certain bank, assuming the CSB and MSB will follow after the enumerator. For this to work the index needs to go from (size-1) to 0 not 0 to (size-1). However for the GPIO IRQ handler, the status registers we read register MSB + 3 bytes ahead for the 24 bit GPIOs and index registers from MSB upwards and run an index i over the registers UNLESS we are STMPE1600. This is not working when we get to clearing the interrupt EDGE status register STMPE_IDX_GPEDR_[LCM]SB: it is indexed like all other registers [STMPE_IDX__LSB + i] but in this loop we index from 0 to get the right bank index for the calculations, and we need to just add i to the MSB. Before this, interrupts on the STMPE2401 were broken, this patch fixes it so it works again. Cc: Patrice Chotard <patrice.chotard@st.com> Fixes: 43db289d00c6 ("gpio: stmpe: Rework registers access") Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09	timekeeping_Force_unsigned_clocksource_to_nanoseconds_conversion	Thomas Gleixner
	commit 9c1645727b8fa90d07256fdfcc45bf831242a3ab upstream. The clocksource delta to nanoseconds conversion is using signed math, but the delta is unsigned. This makes the conversion space smaller than necessary and in case of a multiplication overflow the conversion can become negative. The conversion is done with scaled math: s64 nsec_delta = ((s64)clkdelta * clk->mult) >> clk->shift; Shifting a signed integer right obvioulsy preserves the sign, which has interesting consequences: - Time jumps backwards - __iter_div_u64_rem() which is used in one of the calling code pathes will take forever to piecewise calculate the seconds/nanoseconds part. This has been reported by several people with different scenarios: David observed that when stopping a VM with a debugger: "It was essentially the stopped by debugger case. I forget exactly why, but the guest was being explicitly stopped from outside, it wasn't just scheduling lag. I think it was something in the vicinity of 10 minutes stopped." When lifting the stop the machine went dead. The stopped by debugger case is not really interesting, but nevertheless it would be a good thing not to die completely. But this was also observed on a live system by Liav: "When the OS is too overloaded, delta will get a high enough value for the msb of the sum delta * tkr->mult + tkr->xtime_nsec to be set, and so after the shift the nsec variable will gain a value similar to 0xffffffffff000000." Unfortunately this has been reintroduced recently with commit 6bd58f09e1d8 ("time: Add cycles to nanoseconds translation"). It had been fixed a year ago already in commit 35a4933a8959 ("time: Avoid signed overflow in timekeeping_get_ns()"). Though it's not surprising that the issue has been reintroduced because the function itself and the whole call chain uses s64 for the result and the propagation of it. The change in this recent commit is subtle: s64 nsec; - nsec = (d * m + n) >> s: + nsec = d * m + n; + nsec >>= s; d being type of cycle_t adds another level of obfuscation. This wouldn't have happened if the previous change to unsigned computation would have made the 'nsec' variable u64 right away and a follow up patch had cleaned up the whole call chain. There have been patches submitted which basically did a revert of the above patch leaving everything else unchanged as signed. Back to square one. This spawned a admittedly pointless discussion about potential users which rely on the unsigned behaviour until someone pointed out that it had been fixed before. The changelogs of said patches added further confusion as they made finally false claims about the consequences for eventual users which expect signed results. Despite delta being cycle_t, aka. u64, it's very well possible to hand in a signed negative value and the signed computation will happily return the correct result. But nobody actually sat down and analyzed the code which was added as user after the propably unintended signed conversion. Though in sensitive code like this it's better to analyze it proper and make sure that nothing relies on this than hunting the subtle wreckage half a year later. After analyzing all call chains it stands that no caller can hand in a negative value (which actually would work due to the s64 cast) and rely on the signed math to do the right thing. Change the conversion function to unsigned math. The conversion of all call chains is done in a follow up patch. This solves the starvation issue, which was caused by the negative result, but it does not solve the underlying problem. It merily procrastinates it. When the timekeeper update is deferred long enough that the unsigned multiplication overflows, then time going backwards is observable again. It does neither solve the issue of clocksources with a small counter width which will wrap around possibly several times and cause random time stamps to be generated. But those are usually not found on systems used for virtualization, so this is likely a non issue. I took the liberty to claim authorship for this simply because analyzing all callsites and writing the changelog took substantially more time than just making the simple s/s64/u64/ change and ignore the rest. Fixes: 6bd58f09e1d8 ("time: Add cycles to nanoseconds translation") Reported-by: David Gibson <david@gibson.dropbear.id.au> Reported-by: Liav Rehana <liavr@mellanox.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Parit Bhargava <prarit@redhat.com> Cc: Laurent Vivier <lvivier@redhat.com> Cc: "Christopher S. Hall" <christopher.s.hall@intel.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20161208204228.688545601@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09	mmc: sd: Meet alignment requirements for raw_ssr DMA	Paul Burton
	commit e85baa8868b016513c0f5738362402495b1a66a5 upstream. The mmc_read_ssr() function results in DMA to the raw_ssr member of struct mmc_card, which is not guaranteed to be cache line aligned & thus might not meet the requirements set out in Documentation/DMA-API.txt: Warnings: Memory coherency operates at a granularity called the cache line width. In order for memory mapped by this API to operate correctly, the mapped region must begin exactly on a cache line boundary and end exactly on one (to prevent two separately mapped regions from sharing a single cache line). Since the cache line size may not be known at compile time, the API will not enforce this requirement. Therefore, it is recommended that driver writers who don't take special care to determine the cache line size at run time only map virtual regions that begin and end on page boundaries (which are guaranteed also to be cache line boundaries). On some systems where DMA is non-coherent this can lead to us losing data that shares cache lines with the raw_ssr array. Fix this by kmalloc'ing a temporary buffer to perform DMA into. kmalloc will ensure the buffer is suitably aligned, allowing the DMA to be performed without any loss of data. Signed-off-by: Paul Burton <paul.burton@imgtec.com> Fixes: 5275a652d296 ("mmc: sd: Export SD Status via “ssr” device attribute") Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09	regulator: stw481x-vmmc: fix ages old enable error	Linus Walleij
	commit 295070e9aa015abb9b92cccfbb1e43954e938133 upstream. The regulator has never been properly enabled, it has been dormant all the time. It's strange that MMC was working at all, but it likely worked by the signals going through the levelshifter and reaching the card anyways. Fixes: 3615a34ea1a6 ("regulator: add STw481x VMMC driver") Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09	mmc: sdhci: Fix recovery from tuning timeout	Adrian Hunter
	commit 61e53bd0047d58caee0c7170613045bf96de4458 upstream. Clearing the tuning bits should reset the tuning circuit. However there is more to do. Reset the command and data lines for good measure, and then for eMMC ensure the card is not still trying to process a tuning command by sending a stop command. Note the JEDEC eMMC specification says the stop command (CMD12) can be used to stop a tuning command (CMD21) whereas the SD specification is silent on the subject with respect to the SD tuning command (CMD19). Considering that CMD12 is not a valid SDIO command, the stop command is sent only when the tuning command is CMD21 i.e. for eMMC. That addresses cases seen so far which have been on eMMC. Note that this replaces the commit fe5fb2e3b58f ("mmc: sdhci: Reset cmd and data circuits after tuning failure") which is being reverted for v4.9+. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Dan O'Donovan <dan@emutex.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>