summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-02-01v4l: tvp5150: Fix comment regarding output pin muxingLaurent Pinchart
commit b4b2de386bbb6589d81596999d4a924928dc119b upstream. The FID/GLCO/VLK/HVLK and INTREQ/GPCL/VBLK pins are muxed differently depending on whether the input is an S-Video or composite signal. The comment that explains the logic doesn't reflect the code. It appears that the comment is incorrect, as disabling the output data bus in composite mode makes no sense. Update the comment to match the code. While at it define macros for the MISC_CTL register bits, the code is too confusing with numerical values. Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01v4l: tvp5150: Reset device at probe time, not in get/set format handlersLaurent Pinchart
commit aff808e813fc2d311137754165cf53d4ee6ddcc2 upstream. The tvp5150 doesn't support format setting through the subdev pad API and thus implements the set format handler as a get format operation. The single handler, tvp5150_fill_fmt(), resets the device by calling tvp5150_reset(). This causes malfunction as the device can be reset at will, possibly from userspace when the subdev userspace API is enabled. The reset call was added in commit ec2c4f3f93cb ("[media] media: tvp5150: Add mbus_fmt callbacks"), probably as an attempt to set the device to a known state before detecting the current TV standard. However, the get format handler doesn't access the hardware to get the TV standard since commit 963ddc63e20d ("[media] media: tvp5150: Add cropping support"). There is thus no need to reset the device when getting the format. However, removing the tvp5150_reset() from the get/set format handlers results in the function not being called at all if the bridge driver doesn't use the .reset() operation. The operation is nowadays abused and shouldn't be used, so shouldn't expect bridge drivers to call it. To make sure the device is properly initialize, move the reset call from the format handlers to the probe function. Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01pctv452e: move buffer to heap, no mutexMax Kellermann
commit 48775cb73c2e26b7ca9d679875a6e570c8b8e124 upstream. commit 73d5c5c864f4 ("[media] pctv452e: don't do DMA on stack") caused a NULL pointer dereference which occurs when dvb_usb_init() calls dvb_usb_device_power_ctrl() for the first time, before the frontend has been attached. It also caused a recursive deadlock because tt3650_ci_msg_locked() has already locked the mutex. So, partially revert it, but move the buffer to the heap (DMA capable), not to the stack (may not be DMA capable). Instead of sharing one buffer which needs mutex protection, do a new heap allocation for each call. Fixes: commit 73d5c5c864f4 ("[media] pctv452e: don't do DMA on stack") Signed-off-by: Max Kellermann <max.kellermann@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01iw_cxgb4: free EQ queue memory on last derefSteve Wise
commit c12a67fec8d99bb554e8d4e99120d418f1a39c87 upstream. Commit ad61a4c7a9b7 ("iw_cxgb4: don't block in destroy_qp awaiting the last deref") introduced a bug where the RDMA QP EQ queue memory (and QIDs) are possibly freed before the underlying connection has been fully shutdown. The result being a possible DMA read issued by HW after the queue memory has been unmapped and freed. This results in possible WR corruption in the worst case, system bus errors if an IOMMU is in use, and SGE "bad WR" errors reported in the very least. The fix is to defer unmap/free of queue memory and QID resources until the QP struct has been fully dereferenced. To do this, the c4iw_ucontext must also be kept around until the last QP that references it is fully freed. In addition, since the last QP deref can happen in an IRQ disabled context, we need a new workqueue thread to do the final unmap/free of the EQ queue memory. Fixes: ad61a4c7a9b7 ("iw_cxgb4: don't block in destroy_qp awaiting the last deref") Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01SUNRPC: cleanup ida information when removing sunrpc moduleKinglong Mee
commit c929ea0b910355e1876c64431f3d5802f95b3d75 upstream. After removing sunrpc module, I get many kmemleak information as, unreferenced object 0xffff88003316b1e0 (size 544): comm "gssproxy", pid 2148, jiffies 4294794465 (age 4200.081s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffffb0cfb58a>] kmemleak_alloc+0x4a/0xa0 [<ffffffffb03507fe>] kmem_cache_alloc+0x15e/0x1f0 [<ffffffffb0639baa>] ida_pre_get+0xaa/0x150 [<ffffffffb0639cfd>] ida_simple_get+0xad/0x180 [<ffffffffc06054fb>] nlmsvc_lookup_host+0x4ab/0x7f0 [lockd] [<ffffffffc0605e1d>] lockd+0x4d/0x270 [lockd] [<ffffffffc06061e5>] param_set_timeout+0x55/0x100 [lockd] [<ffffffffc06cba24>] svc_defer+0x114/0x3f0 [sunrpc] [<ffffffffc06cbbe7>] svc_defer+0x2d7/0x3f0 [sunrpc] [<ffffffffc06c71da>] rpc_show_info+0x8a/0x110 [sunrpc] [<ffffffffb044a33f>] proc_reg_write+0x7f/0xc0 [<ffffffffb038e41f>] __vfs_write+0xdf/0x3c0 [<ffffffffb0390f1f>] vfs_write+0xef/0x240 [<ffffffffb0392fbd>] SyS_write+0xad/0x130 [<ffffffffb0d06c37>] entry_SYSCALL_64_fastpath+0x1a/0xa9 [<ffffffffffffffff>] 0xffffffffffffffff I found, the ida information (dynamic memory) isn't cleanup. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Fixes: 2f048db4680a ("SUNRPC: Add an identifier for struct rpc_clnt") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01NFSv4.0: always send mode in SETATTR after EXCLUSIVE4Benjamin Coddington
commit a430607b2ef7c3be090f88c71cfcb1b3988aa7c0 upstream. Some nfsv4.0 servers may return a mode for the verifier following an open with EXCLUSIVE4 createmode, but this does not mean the client should skip setting the mode in the following SETATTR. It should only do that for EXCLUSIVE4_1 or UNGAURDED createmode. Fixes: 5334c5bdac92 ("NFS: Send attributes in OPEN request for NFS4_CREATE_EXCLUSIVE4_1") Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01NFSv4.1: Fix a deadlock in layoutgetTrond Myklebust
commit 8ac092519ad91931c96d306c4bfae2c6587c325f upstream. We cannot call nfs4_handle_exception() without first ensuring that the slot has been freed. If not, we end up deadlocking with the process waiting for recovery to complete, and recovery waiting for the slot table to drain. Fixes: 2e80dbe7ac51 ("NFSv4.1: Close callback races for OPEN, LAYOUTGET...") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01nfs: Don't increment lock sequence ID after NFS4ERR_MOVEDChuck Lever
commit 059aa734824165507c65fd30a55ff000afd14983 upstream. Xuan Qi reports that the Linux NFSv4 client failed to lock a file that was migrated. The steps he observed on the wire: 1. The client sent a LOCK request to the source server 2. The source server replied NFS4ERR_MOVED 3. The client switched to the destination server 4. The client sent the same LOCK request to the destination server with a bumped lock sequence ID 5. The destination server rejected the LOCK request with NFS4ERR_BAD_SEQID RFC 3530 section 8.1.5 provides a list of NFS errors which do not bump a lock sequence ID. However, RFC 3530 is now obsoleted by RFC 7530. In RFC 7530 section 9.1.7, this list has been updated by the addition of NFS4ERR_MOVED. Reported-by: Xuan Qi <xuan.qi@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01parisc: Don't use BITS_PER_LONG in userspace-exported swab.h headerHelge Deller
commit 2ad5d52d42810bed95100a3d912679d8864421ec upstream. In swab.h the "#if BITS_PER_LONG > 32" breaks compiling userspace programs if BITS_PER_LONG is #defined by userspace with the sizeof() compiler builtin. Solve this problem by using __BITS_PER_LONG instead. Since we now #include asm/bitsperlong.h avoid further potential userspace pollution by moving the #define of SHIFT_PER_LONG to bitops.h which is not exported to userspace. This patch unbreaks compiling qemu on hppa/parisc. Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01ARC: [arcompact] handle unaligned access delay slot corner caseVineet Gupta
commit 9aed02feae57bf7a40cb04ea0e3017cb7a998db4 upstream. After emulating an unaligned access in delay slot of a branch, we pretend as the delay slot never happened - so return back to actual branch target (or next PC if branch was not taken). Curently we did this by handling STATUS32.DE, we also need to clear the BTA.T bit, which is disregarded when returning from original misaligned exception, but could cause weirdness if it took the interrupt return path (in case interrupt was acive too) One ARC700 customer ran into this when enabling unaligned access fixup for kernel mode accesses as well Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01ARC: udelay: fix inline assembler by adding LP_COUNT to clobber listVineet Gupta
commit 36425cd67052e3becf325fd4d3ba5691791ef7e4 upstream. commit 3c7c7a2fc8811bc ("ARC: Don't use "+l" inline asm constraint") modified the inline assembly to setup LP_COUNT register manually and NOT rely on gcc to do it (with the +l inline assembler contraint hint, now being retired in the compiler) However the fix was flawed as we didn't add LP_COUNT to asm clobber list, meaning gcc doesn't know that LP_COUNT or zero-delay-loops are in action in the inline asm. This resulted in some fun - as nested ZOL loops were being generared | mov lp_count,250000 ;16 # tmp235, | lp .L__GCC__LP14 # <======= OUTER LOOP (gcc generated) | .L14: | ld r2, [r5] # MEM[(volatile u32 *)prephitmp_43], w | dmb 1 | breq r2, -1, @.L21 #, w,, | bbit0 r2,1,@.L13 # w,, | ld r4,[r7] ;25 # loops_per_jiffy, loops_per_jiffy | mpymu r3,r4,r6 #, loops_per_jiffy, tmp234 | | mov lp_count, r3 # <====== INNER LOOP (from inline asm) | lp 1f | nop | 1: | nop_s | .L__GCC__LP14: ; loop end, start is @.L14 #, This caused issues with drivers relying on sane behaviour of udelay friends. With LP_COUNT added to clobber list, gcc doesn't generate the outer loop in say above case. Addresses STAR 9001146134 Reported-by: Joao Pinto <jpinto@synopsys.com> Fixes: 3c7c7a2fc8811bc ("ARC: Don't use "+l" inline asm constraint") Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01can: ti_hecc: add missing prepare and unprepare of the clockYegor Yefremov
commit befa60113ce7ea270cb51eada28443ca2756f480 upstream. In order to make the driver work with the common clock framework, this patch converts the clk_enable()/clk_disable() to clk_prepare_enable()/clk_disable_unprepare(). Also add error checking for clk_prepare_enable(). Signed-off-by: Yegor Yefremov <yegorslists@googlemail.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01can: c_can_pci: fix null-pointer-deref in c_can_start() - set device pointerEinar Jón
commit c97c52be78b8463ac5407f1cf1f22f8f6cf93a37 upstream. The priv->device pointer for c_can_pci is never set, but it is used without a NULL check in c_can_start(). Setting it in c_can_pci_probe() like c_can_plat_probe() prevents c_can_pci.ko from crashing, with and without CONFIG_PM. This might also cause the pm_runtime_*() functions in c_can.c to actually be executed for c_can_pci devices - they are the only other place where priv->device is used, but they all contain a null check. Signed-off-by: Einar Jón <tolvupostur@gmail.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01IB/srp: fix invalid indirect_sg_entries parameter valueIsrael Rukshin
commit 0a475ef4226e305bdcffe12b401ca1eab06c4913 upstream. After setting indirect_sg_entries module_param to huge value (e.g 500,000), srp_alloc_req_data() fails to allocate indirect descriptors for the request ring (kmalloc fails). This commit enforces the maximum value of indirect_sg_entries to be SG_MAX_SEGMENTS as signified in module param description. Fixes: 65e8617fba17 (scsi: rename SCSI_MAX_{SG, SG_CHAIN}_SEGMENTS) Fixes: c07d424d6118 (IB/srp: add support for indirect tables that don't fit in SRP_CMD) Signed-off-by: Israel Rukshin <israelr@mellanox.com> Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Laurence Oberman <loberman@redhat.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>-- Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01IB/srp: fix mr allocation when the device supports sg gapsIsrael Rukshin
commit ad8e66b4a80182174f73487ed25fd2140cf43361 upstream. If the device support arbitrary sg list mapping (device cap IB_DEVICE_SG_GAPS_REG set) we allocate the memory regions with IB_MR_TYPE_SG_GAPS. Fixes: 509c5f33f4f6 ("IB/srp: Prevent mapping failures") Signed-off-by: Israel Rukshin <israelr@mellanox.com> Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01IB/iser: Fix sg_tablesize calculationMax Gurtovoy
commit 1e5db6c31ade4150c2e2b1a21e39f776c38fea39 upstream. For devices that can register page list that is bigger than USHRT_MAX, we actually take the wrong value for sg_tablesize. E.g: for CX4 max_fast_reg_page_list_len is 65536 (bigger than USHRT_MAX) so we set sg_tablesize to 0 by mistake. Therefore, each IO that is bigger than 4k splitted to "< 4k" chunks that cause performance degredation. Remove wrong sg_tablesize assignment, and use the value that was set during address resolution handler with the needed casting. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01IB/cxgb3: fix misspelling in header guardNicolas Iooss
commit b1a27eac7fefff33ccf6acc919fc0725bf9815fb upstream. Use CXGB3_... instead of CXBG3_... Fixes: a85fb3383340 ("IB/cxgb3: Move user vendor structures") Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Steve Wise <swise@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01s390/ptrace: Preserve previous registers for short regset writeMartin Schwidefsky
commit 9dce990d2cf57b5ed4e71a9cdbd7eae4335111ff upstream. Ensure that if userspace supplies insufficient data to PTRACE_SETREGSET to fill all the registers, the thread's old registers are preserved. convert_vx_to_fp() is adapted to handle only a specified number of registers rather than unconditionally handling all of them: other callers of this function are adapted appropriately. Based on an initial patch by Dave Martin. Reported-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01s390/mm: Fix cmma unused transfer from pgste into pteChristian Borntraeger
commit 0d6da872d3e4a60f43c295386d7ff9a4cdcd57e9 upstream. The last pgtable rework silently disabled the CMMA unused state by setting a local pte variable (a parameter) instead of propagating it back into the caller. Fix it. Fixes: ebde765c0e85 ("s390/mm: uninline ptep_xxx functions from pgtable.h") Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01RDMA/cma: Fix unknown symbol when CONFIG_IPV6 is not enabledJack Morgenstein
commit b4cfe3971f6eab542dd7ecc398bfa1aeec889934 upstream. If IPV6 has not been enabled in the underlying kernel, we must avoid calling IPV6 procedures in rdma_cm.ko. This requires using "IS_ENABLED(CONFIG_IPV6)" in "if" statements surrounding any code which calls external IPV6 procedures. In the instance fixed here, procedure cma_bind_addr() called ipv6_addr_type() -- which resulted in calling external procedure __ipv6_addr_type(). Fixes: 6c26a77124ff ("RDMA/cma: fix IPv6 address resolution") Cc: Spencer Baugh <sbaugh@catern.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Reviewed-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01Btrfs: remove ->{get, set}_acl() from btrfs_dir_ro_inode_operationsOmar Sandoval
commit 57b59ed2e5b91e958843609c7884794e29e6c4cb upstream. Subvolume directory inodes can't have ACLs. Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01Btrfs: disable xattr operations on subvolume directoriesOmar Sandoval
commit 1fdf41941b8010691679638f8d0c8d08cfee7726 upstream. When you snapshot a subvolume containing a subvolume, you get a placeholder directory where the subvolume would be. These directory inodes have ->i_ops set to btrfs_dir_ro_inode_operations. Previously, these i_ops didn't include the xattr operation callbacks. The conversion to xattr_handlers missed this case, leading to bogus attempts to set xattrs on these inodes. This manifested itself as failures when running delayed inodes. To fix this, clear IOP_XATTR in ->i_opflags on these inodes. Fixes: 6c6ef9f26e59 ("xattr: Stop calling {get,set,remove}xattr inode operations") Cc: Andreas Gruenbacher <agruenba@redhat.com> Reported-by: Chris Murphy <lists@colorremedies.com> Tested-by: Chris Murphy <lists@colorremedies.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01Btrfs: remove old tree_root case in btrfs_read_locked_inode()Omar Sandoval
commit 67ade058ef2c65a3e56878af9c293ec76722a2e5 upstream. As Jeff explained in c2951f32d36c ("btrfs: remove old tree_root dirent processing in btrfs_real_readdir()"), supporting this old format is no longer necessary since the Btrfs magic number has been updated since we changed to the current format. There are other places where we still handle this old format, but since this is part of a fix that is going to stable, I'm only removing this one for now. Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01ISDN: eicon: silence misleading array-bounds warningArnd Bergmann
commit 950eabbd6ddedc1b08350b9169a6a51b130ebaaf upstream. With some gcc versions, we get a warning about the eicon driver, and that currently shows up as the only remaining warning in one of the build bots: In file included from ../drivers/isdn/hardware/eicon/message.c:30:0: eicon/message.c: In function 'mixer_notify_update': eicon/platform.h:333:18: warning: array subscript is above array bounds [-Warray-bounds] The code is easily changed to open-code the unusual PUT_WORD() line causing this to avoid the warning. Link: http://arm-soc.lixom.net/buildlogs/stable-rc/v4.4.45/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01xfs: prevent quotacheck from overloading inode lruBrian Foster
commit e0d76fa4475ef2cf4b52d18588b8ce95153d021b upstream. Quotacheck runs at mount time in situations where quota accounting must be recalculated. In doing so, it uses bulkstat to visit every inode in the filesystem. Historically, every inode processed during quotacheck was released and immediately tagged for reclaim because quotacheck runs before the superblock is marked active by the VFS. In other words, the final iput() lead to an immediate ->destroy_inode() call, which allowed the XFS background reclaim worker to start reclaiming inodes. Commit 17c12bcd3 ("xfs: when replaying bmap operations, don't let unlinked inodes get reaped") marks the XFS superblock active sooner as part of the mount process to support caching inodes processed during log recovery. This occurs before quotacheck and thus means all inodes processed by quotacheck are inserted to the LRU on release. The s_umount lock is held until the mount has completed and thus prevents the shrinkers from operating on the sb. This means that quotacheck can excessively populate the inode LRU and lead to OOM conditions on systems without sufficient RAM. Update the quotacheck bulkstat handler to set XFS_IGET_DONTCACHE on inodes processed by quotacheck. This causes ->drop_inode() to return 1 and in turn causes iput_final() to evict the inode. This preserves the original quotacheck behavior and prevents it from overloading the LRU and running out of memory. Reported-by: Martin Svec <martin.svec@zoner.cz> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01sysctl: fix proc_doulongvec_ms_jiffies_minmax()Eric Dumazet
commit ff9f8a7cf935468a94d9927c68b00daae701667e upstream. We perform the conversion between kernel jiffies and ms only when exporting kernel value to user space. We need to do the opposite operation when value is written by user. Only matters when HZ != 1000 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01userns: Make ucounts lock irq-safeNikolay Borisov
commit 880a38547ff08715ce4f1daf9a4bb30c87676e68 upstream. The ucounts_lock is being used to protect various ucounts lifecycle management functionalities. However, those services can also be invoked when a pidns is being freed in an RCU callback (e.g. softirq context). This can lead to deadlocks. There were already efforts trying to prevent similar deadlocks in add7c65ca426 ("pid: fix lockdep deadlock warning due to ucount_lock"), however they just moved the context from hardirq to softrq. Fix this issue once and for all by explictly making the lock disable irqs altogether. Dmitry Vyukov <dvyukov@google.com> reported: > I've got the following deadlock report while running syzkaller fuzzer > on eec0d3d065bfcdf9cd5f56dd2a36b94d12d32297 of linux-next (on odroid > device if it matters): > > ================================= > [ INFO: inconsistent lock state ] > 4.10.0-rc3-next-20170112-xc2-dirty #6 Not tainted > --------------------------------- > inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. > swapper/2/0 [HC0[0]:SC1[1]:HE1:SE0] takes: > (ucounts_lock){+.?...}, at: [< inline >] spin_lock > ./include/linux/spinlock.h:302 > (ucounts_lock){+.?...}, at: [<ffff2000081678c8>] > put_ucounts+0x60/0x138 kernel/ucount.c:162 > {SOFTIRQ-ON-W} state was registered at: > [<ffff2000081c82d8>] mark_lock+0x220/0xb60 kernel/locking/lockdep.c:3054 > [< inline >] mark_irqflags kernel/locking/lockdep.c:2941 > [<ffff2000081c97a8>] __lock_acquire+0x388/0x3260 kernel/locking/lockdep.c:3295 > [<ffff2000081cce24>] lock_acquire+0xa4/0x138 kernel/locking/lockdep.c:3753 > [< inline >] __raw_spin_lock ./include/linux/spinlock_api_smp.h:144 > [<ffff200009798128>] _raw_spin_lock+0x90/0xd0 kernel/locking/spinlock.c:151 > [< inline >] spin_lock ./include/linux/spinlock.h:302 > [< inline >] get_ucounts kernel/ucount.c:131 > [<ffff200008167c28>] inc_ucount+0x80/0x6c8 kernel/ucount.c:189 > [< inline >] inc_mnt_namespaces fs/namespace.c:2818 > [<ffff200008481850>] alloc_mnt_ns+0x78/0x3a8 fs/namespace.c:2849 > [<ffff200008487298>] create_mnt_ns+0x28/0x200 fs/namespace.c:2959 > [< inline >] init_mount_tree fs/namespace.c:3199 > [<ffff200009bd6674>] mnt_init+0x258/0x384 fs/namespace.c:3251 > [<ffff200009bd60bc>] vfs_caches_init+0x6c/0x80 fs/dcache.c:3626 > [<ffff200009bb1114>] start_kernel+0x414/0x460 init/main.c:648 > [<ffff200009bb01e8>] __primary_switched+0x6c/0x70 arch/arm64/kernel/head.S:456 > irq event stamp: 2316924 > hardirqs last enabled at (2316924): [< inline >] rcu_do_batch > kernel/rcu/tree.c:2911 > hardirqs last enabled at (2316924): [< inline >] > invoke_rcu_callbacks kernel/rcu/tree.c:3182 > hardirqs last enabled at (2316924): [< inline >] > __rcu_process_callbacks kernel/rcu/tree.c:3149 > hardirqs last enabled at (2316924): [<ffff200008210414>] > rcu_process_callbacks+0x7a4/0xc28 kernel/rcu/tree.c:3166 > hardirqs last disabled at (2316923): [< inline >] rcu_do_batch > kernel/rcu/tree.c:2900 > hardirqs last disabled at (2316923): [< inline >] > invoke_rcu_callbacks kernel/rcu/tree.c:3182 > hardirqs last disabled at (2316923): [< inline >] > __rcu_process_callbacks kernel/rcu/tree.c:3149 > hardirqs last disabled at (2316923): [<ffff20000820fe80>] > rcu_process_callbacks+0x210/0xc28 kernel/rcu/tree.c:3166 > softirqs last enabled at (2316912): [<ffff20000811b4c4>] > _local_bh_enable+0x4c/0x80 kernel/softirq.c:155 > softirqs last disabled at (2316913): [< inline >] > do_softirq_own_stack ./include/linux/interrupt.h:488 > softirqs last disabled at (2316913): [< inline >] > invoke_softirq kernel/softirq.c:371 > softirqs last disabled at (2316913): [<ffff20000811c994>] > irq_exit+0x264/0x308 kernel/softirq.c:405 > > other info that might help us debug this: > Possible unsafe locking scenario: > > CPU0 > ---- > lock(ucounts_lock); > <Interrupt> > lock(ucounts_lock); > > *** DEADLOCK *** > > 1 lock held by swapper/2/0: > #0: (rcu_callback){......}, at: [< inline >] __rcu_reclaim > kernel/rcu/rcu.h:108 > #0: (rcu_callback){......}, at: [< inline >] rcu_do_batch > kernel/rcu/tree.c:2919 > #0: (rcu_callback){......}, at: [< inline >] > invoke_rcu_callbacks kernel/rcu/tree.c:3182 > #0: (rcu_callback){......}, at: [< inline >] > __rcu_process_callbacks kernel/rcu/tree.c:3149 > #0: (rcu_callback){......}, at: [<ffff200008210390>] > rcu_process_callbacks+0x720/0xc28 kernel/rcu/tree.c:3166 > > stack backtrace: > CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.10.0-rc3-next-20170112-xc2-dirty #6 > Hardware name: Hardkernel ODROID-C2 (DT) > Call trace: > [<ffff20000808fa60>] dump_backtrace+0x0/0x440 arch/arm64/kernel/traps.c:500 > [<ffff20000808fec0>] show_stack+0x20/0x30 arch/arm64/kernel/traps.c:225 > [<ffff2000088a99e0>] dump_stack+0x110/0x168 > [<ffff2000082fa2b4>] print_usage_bug.part.27+0x49c/0x4bc > kernel/locking/lockdep.c:2387 > [< inline >] print_usage_bug kernel/locking/lockdep.c:2357 > [< inline >] valid_state kernel/locking/lockdep.c:2400 > [< inline >] mark_lock_irq kernel/locking/lockdep.c:2617 > [<ffff2000081c89ec>] mark_lock+0x934/0xb60 kernel/locking/lockdep.c:3065 > [< inline >] mark_irqflags kernel/locking/lockdep.c:2923 > [<ffff2000081c9a60>] __lock_acquire+0x640/0x3260 kernel/locking/lockdep.c:3295 > [<ffff2000081cce24>] lock_acquire+0xa4/0x138 kernel/locking/lockdep.c:3753 > [< inline >] __raw_spin_lock ./include/linux/spinlock_api_smp.h:144 > [<ffff200009798128>] _raw_spin_lock+0x90/0xd0 kernel/locking/spinlock.c:151 > [< inline >] spin_lock ./include/linux/spinlock.h:302 > [<ffff2000081678c8>] put_ucounts+0x60/0x138 kernel/ucount.c:162 > [<ffff200008168364>] dec_ucount+0xf4/0x158 kernel/ucount.c:214 > [< inline >] dec_pid_namespaces kernel/pid_namespace.c:89 > [<ffff200008293dc8>] delayed_free_pidns+0x40/0xe0 kernel/pid_namespace.c:156 > [< inline >] __rcu_reclaim kernel/rcu/rcu.h:118 > [< inline >] rcu_do_batch kernel/rcu/tree.c:2919 > [< inline >] invoke_rcu_callbacks kernel/rcu/tree.c:3182 > [< inline >] __rcu_process_callbacks kernel/rcu/tree.c:3149 > [<ffff2000082103d8>] rcu_process_callbacks+0x768/0xc28 kernel/rcu/tree.c:3166 > [<ffff2000080821dc>] __do_softirq+0x324/0x6e0 kernel/softirq.c:284 > [< inline >] do_softirq_own_stack ./include/linux/interrupt.h:488 > [< inline >] invoke_softirq kernel/softirq.c:371 > [<ffff20000811c994>] irq_exit+0x264/0x308 kernel/softirq.c:405 > [<ffff2000081ecc28>] __handle_domain_irq+0xc0/0x150 kernel/irq/irqdesc.c:636 > [<ffff200008081c80>] gic_handle_irq+0x68/0xd8 > Exception stack(0xffff8000648e7dd0 to 0xffff8000648e7f00) > 7dc0: ffff8000648d4b3c 0000000000000007 > 7de0: 0000000000000000 1ffff0000c91a967 1ffff0000c91a967 1ffff0000c91a967 > 7e00: ffff20000a4b6b68 0000000000000001 0000000000000007 0000000000000001 > 7e20: 1fffe4000149ae90 ffff200009d35000 0000000000000000 0000000000000002 > 7e40: 0000000000000000 0000000000000000 0000000002624a1a 0000000000000000 > 7e60: 0000000000000000 ffff200009cbcd88 000060006d2ed000 0000000000000140 > 7e80: ffff200009cff000 ffff200009cb6000 ffff200009cc2020 ffff200009d2159d > 7ea0: 0000000000000000 ffff8000648d4380 0000000000000000 ffff8000648e7f00 > 7ec0: ffff20000820a478 ffff8000648e7f00 ffff20000820a47c 0000000010000145 > 7ee0: 0000000000000140 dfff200000000000 ffffffffffffffff ffff20000820a478 > [<ffff2000080837f8>] el1_irq+0xb8/0x130 arch/arm64/kernel/entry.S:486 > [< inline >] arch_local_irq_restore > ./arch/arm64/include/asm/irqflags.h:81 > [<ffff20000820a47c>] rcu_idle_exit+0x64/0xa8 kernel/rcu/tree.c:1030 > [< inline >] cpuidle_idle_call kernel/sched/idle.c:200 > [<ffff2000081bcbfc>] do_idle+0x1dc/0x2d0 kernel/sched/idle.c:243 > [<ffff2000081bd1cc>] cpu_startup_entry+0x24/0x28 kernel/sched/idle.c:345 > [<ffff200008099f8c>] secondary_start_kernel+0x2cc/0x358 > arch/arm64/kernel/smp.c:276 > [<000000000279f1a4>] 0x279f1a4 Reported-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Fixes: add7c65ca426 ("pid: fix lockdep deadlock warning due to ucount_lock") Fixes: f333c700c610 ("pidns: Add a limit on the number of pid namespaces") Link: https://www.spinics.net/lists/kernel/msg2426637.html Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01vring: Force use of DMA API for ARM-based systems with legacy devicesWill Deacon
commit c7070619f3408d9a0dffbed9149e6f00479cf43b upstream. Booting Linux on an ARM fastmodel containing an SMMU emulation results in an unexpected I/O page fault from the legacy virtio-blk PCI device: [ 1.211721] arm-smmu-v3 2b400000.smmu: event 0x10 received: [ 1.211800] arm-smmu-v3 2b400000.smmu: 0x00000000fffff010 [ 1.211880] arm-smmu-v3 2b400000.smmu: 0x0000020800000000 [ 1.211959] arm-smmu-v3 2b400000.smmu: 0x00000008fa081002 [ 1.212075] arm-smmu-v3 2b400000.smmu: 0x0000000000000000 [ 1.212155] arm-smmu-v3 2b400000.smmu: event 0x10 received: [ 1.212234] arm-smmu-v3 2b400000.smmu: 0x00000000fffff010 [ 1.212314] arm-smmu-v3 2b400000.smmu: 0x0000020800000000 [ 1.212394] arm-smmu-v3 2b400000.smmu: 0x00000008fa081000 [ 1.212471] arm-smmu-v3 2b400000.smmu: 0x0000000000000000 <system hangs failing to read partition table> This is because the legacy virtio-blk device is behind an SMMU, so we have consequently swizzled its DMA ops and configured the SMMU to translate accesses. This then requires the vring code to use the DMA API to establish translations, otherwise all transactions will result in fatal faults and termination. Given that ARM-based systems only see an SMMU if one is really present (the topology is all described by firmware tables such as device-tree or IORT), then we can safely use the DMA API for all legacy virtio devices. Modern devices can advertise the prescense of an IOMMU using the VIRTIO_F_IOMMU_PLATFORM feature flag. Cc: Andy Lutomirski <luto@kernel.org> Cc: Michael S. Tsirkin <mst@redhat.com> Fixes: 876945dbf649 ("arm64: Hook up IOMMU dma_ops") Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01mm, page_alloc: fix premature OOM when racing with cpuset mems updateVlastimil Babka
commit e47483bca2cc59a4593b37a270b16ee42b1d9f08 upstream. Ganapatrao Kulkarni reported that the LTP test cpuset01 in stress mode triggers OOM killer in few seconds, despite lots of free memory. The test attempts to repeatedly fault in memory in one process in a cpuset, while changing allowed nodes of the cpuset between 0 and 1 in another process. The problem comes from insufficient protection against cpuset changes, which can cause get_page_from_freelist() to consider all zones as non-eligible due to nodemask and/or current->mems_allowed. This was masked in the past by sufficient retries, but since commit 682a3385e773 ("mm, page_alloc: inline the fast path of the zonelist iterator") we fix the preferred_zoneref once, and don't iterate over the whole zonelist in further attempts, thus the only eligible zones might be placed in the zonelist before our starting point and we always miss them. A previous patch fixed this problem for current->mems_allowed. However, cpuset changes also update the task's mempolicy nodemask. The fix has two parts. We have to repeat the preferred_zoneref search when we detect cpuset update by way of seqcount, and we have to check the seqcount before considering OOM. [akpm@linux-foundation.org: fix typo in comment] Link: http://lkml.kernel.org/r/20170120103843.24587-5-vbabka@suse.cz Fixes: c33d6c06f60f ("mm, page_alloc: avoid looking up the first zone in a zonelist twice") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Ganapatrao Kulkarni <gpkulkarni@gmail.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01mm, page_alloc: move cpuset seqcount checking to slowpathVlastimil Babka
commit 5ce9bfef1d27944c119a397a9d827bef795487ce upstream. This is a preparation for the following patch to make review simpler. While the primary motivation is a bug fix, this also simplifies the fast path, although the moved code is only enabled when cpusets are in use. Link: http://lkml.kernel.org/r/20170120103843.24587-4-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Ganapatrao Kulkarni <gpkulkarni@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01mm, page_alloc: fix fast-path race with cpuset update or removalVlastimil Babka
commit 16096c25bf0ca5d87e4fa6ec6108ba53feead212 upstream. Ganapatrao Kulkarni reported that the LTP test cpuset01 in stress mode triggers OOM killer in few seconds, despite lots of free memory. The test attempts to repeatedly fault in memory in one process in a cpuset, while changing allowed nodes of the cpuset between 0 and 1 in another process. One possible cause is that in the fast path we find the preferred zoneref according to current mems_allowed, so that it points to the middle of the zonelist, skipping e.g. zones of node 1 completely. If the mems_allowed is updated to contain only node 1, we never reach it in the zonelist, and trigger OOM before checking the cpuset_mems_cookie. This patch fixes the particular case by redoing the preferred zoneref search if we switch back to the original nodemask. The condition is also slightly changed so that when the last non-root cpuset is removed, we don't miss it. Note that this is not a full fix, and more patches will follow. Link: http://lkml.kernel.org/r/20170120103843.24587-3-vbabka@suse.cz Fixes: 682a3385e773 ("mm, page_alloc: inline the fast path of the zonelist iterator") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Ganapatrao Kulkarni <gpkulkarni@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01mm, page_alloc: fix check for NULL preferred_zoneVlastimil Babka
commit ea57485af8f4221312a5a95d63c382b45e7840dc upstream. Patch series "fix premature OOM regression in 4.7+ due to cpuset races". This is v2 of my attempt to fix the recent report based on LTP cpuset stress test [1]. The intention is to go to stable 4.9 LTSS with this, as triggering repeated OOMs is not nice. That's why the patches try to be not too intrusive. Unfortunately why investigating I found that modifying the testcase to use per-VMA policies instead of per-task policies will bring the OOM's back, but that seems to be much older and harder to fix problem. I have posted a RFC [2] but I believe that fixing the recent regressions has a higher priority. Longer-term we might try to think how to fix the cpuset mess in a better and less error prone way. I was for example very surprised to learn, that cpuset updates change not only task->mems_allowed, but also nodemask of mempolicies. Until now I expected the parameter to alloc_pages_nodemask() to be stable. I wonder why do we then treat cpusets specially in get_page_from_freelist() and distinguish HARDWALL etc, when there's unconditional intersection between mempolicy and cpuset. I would expect the nodemask adjustment for saving overhead in g_p_f(), but that clearly doesn't happen in the current form. So we have both crazy complexity and overhead, AFAICS. [1] https://lkml.kernel.org/r/CAFpQJXUq-JuEP=QPidy4p_=FN0rkH5Z-kfB4qBvsf6jMS87Edg@mail.gmail.com [2] https://lkml.kernel.org/r/7c459f26-13a6-a817-e508-b65b903a8378@suse.cz This patch (of 4): Since commit c33d6c06f60f ("mm, page_alloc: avoid looking up the first zone in a zonelist twice") we have a wrong check for NULL preferred_zone, which can theoretically happen due to concurrent cpuset modification. We check the zoneref pointer which is never NULL and we should check the zone pointer. Also document this in first_zones_zonelist() comment per Michal Hocko. Fixes: c33d6c06f60f ("mm, page_alloc: avoid looking up the first zone in a zonelist twice") Link: http://lkml.kernel.org/r/20170120103843.24587-2-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Ganapatrao Kulkarni <gpkulkarni@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01mm/mempolicy.c: do not put mempolicy before using its nodemaskVlastimil Babka
commit d51e9894d27492783fc6d1b489070b4ba66ce969 upstream. Since commit be97a41b291e ("mm/mempolicy.c: merge alloc_hugepage_vma to alloc_pages_vma") alloc_pages_vma() can potentially free a mempolicy by mpol_cond_put() before accessing the embedded nodemask by __alloc_pages_nodemask(). The commit log says it's so "we can use a single exit path within the function" but that's clearly wrong. We can still do that when doing mpol_cond_put() after the allocation attempt. Make sure the mempolicy is not freed prematurely, otherwise __alloc_pages_nodemask() can end up using a bogus nodemask, which could lead e.g. to premature OOM. Fixes: be97a41b291e ("mm/mempolicy.c: merge alloc_hugepage_vma to alloc_pages_vma") Link: http://lkml.kernel.org/r/20170118141124.8345-1-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thpKeno Fischer
commit 8310d48b125d19fcd9521d83b8293e63eb1646aa upstream. In commit 19be0eaffa3a ("mm: remove gup_flags FOLL_WRITE games from __get_user_pages()"), the mm code was changed from unsetting FOLL_WRITE after a COW was resolved to setting the (newly introduced) FOLL_COW instead. Simultaneously, the check in gup.c was updated to still allow writes with FOLL_FORCE set if FOLL_COW had also been set. However, a similar check in huge_memory.c was forgotten. As a result, remote memory writes to ro regions of memory backed by transparent huge pages cause an infinite loop in the kernel (handle_mm_fault sets FOLL_COW and returns 0 causing a retry, but follow_trans_huge_pmd bails out immidiately because `(flags & FOLL_WRITE) && !pmd_write(*pmd)` is true. While in this state the process is stil SIGKILLable, but little else works (e.g. no ptrace attach, no other signals). This is easily reproduced with the following code (assuming thp are set to always): #include <assert.h> #include <fcntl.h> #include <stdint.h> #include <stdio.h> #include <string.h> #include <sys/mman.h> #include <sys/stat.h> #include <sys/types.h> #include <sys/wait.h> #include <unistd.h> #define TEST_SIZE 5 * 1024 * 1024 int main(void) { int status; pid_t child; int fd = open("/proc/self/mem", O_RDWR); void *addr = mmap(NULL, TEST_SIZE, PROT_READ, MAP_ANONYMOUS | MAP_PRIVATE, 0, 0); assert(addr != MAP_FAILED); pid_t parent_pid = getpid(); if ((child = fork()) == 0) { void *addr2 = mmap(NULL, TEST_SIZE, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, 0, 0); assert(addr2 != MAP_FAILED); memset(addr2, 'a', TEST_SIZE); pwrite(fd, addr2, TEST_SIZE, (uintptr_t)addr); return 0; } assert(child == waitpid(child, &status, 0)); assert(WIFEXITED(status) && WEXITSTATUS(status) == 0); return 0; } Fix this by updating follow_trans_huge_pmd in huge_memory.c analogously to the update in gup.c in the original commit. The same pattern exists in follow_devmap_pmd. However, we should not be able to reach that check with FOLL_COW set, so add WARN_ONCE to make sure we notice if we ever do. [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/20170106015025.GA38411@juliacomputing.com Signed-off-by: Keno Fischer <keno@juliacomputing.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Greg Thelen <gthelen@google.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Willy Tarreau <w@1wt.eu> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01drm/atomic: clear out fence when duplicating stateLucas Stach
[Fixed differently in 4.10] The fence needs to be cleared out, otherwise the following commit might wait on a stale fence from the previous commit. This was fixed as a side effect of 9626014258a5 (drm/fence: add in-fences support) in kernel 4.10. As this commit introduces new functionality and as such can not be applied to stable, this patch is the minimal fix for the kernel 4.9 stable series. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Tested-by: Fabio Estevam <fabio.estevam@nxp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01Revert "drm/radeon: always apply pci shutdown callbacks"Alex Deucher
commit b9b487e494712c8e5905b724e12f5ef17e9ae6f9 upstream. This seems to break reboot on some evergreen systems. bugs: https://bugs.freedesktop.org/show_bug.cgi?id=99524 https://bugzilla.kernel.org/show_bug.cgi?id=192271 This reverts commit a481daa88fd4d6b54f25348972bba10b5f6a84d0. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01drm/vc4: fix a bounds checkDan Carpenter
commit 21ccc32496b2f63228f5232b3ac0e426e8fb3c31 upstream. We accidentally return success even if vc4_full_res_bounds_check() fails. Fixes: d5b1a78a772f ("drm/vc4: Add support for drawing 3D frames.") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Eric Engestrom <eric@engestrom.ch> Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01drm/vc4: Return -EINVAL on the overflow checks failing.Eric Anholt
commit 6b8ac63847bc2f958dd93c09edc941a0118992d9 upstream. By failing to set the errno, we'd continue on to trying to set up the RCL, and then oops on trying to dereference the tile_bo that binning validation should have set up. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Eric Anholt <eric@anholt.net> Fixes: d5b1a78a772f ("drm/vc4: Add support for drawing 3D frames.") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01drm/vc4: Fix an integer overflow in temporary allocation layout.Eric Anholt
commit 0f2ff82e11c86c05d051cae32b58226392d33bbf upstream. We copy the unvalidated ioctl arguments from the user into kernel temporary memory to run the validation from, to avoid a race where the user updates the unvalidate contents in between validating them and copying them into the validated BO. However, in setting up the layout of the kernel side, we failed to check one of the additions (the roundup() for shader_rec_offset) against integer overflow, allowing a nearly MAX_UINT value of bin_cl_size to cause us to under-allocate the temporary space that we then copy_from_user into. Reported-by: Murray McAllister <murray.mcallister@insomniasec.com> Signed-off-by: Eric Anholt <eric@anholt.net> Fixes: d5b1a78a772f ("drm/vc4: Add support for drawing 3D frames.") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01drm/vc4: Fix memory leak of the CRTC state.Eric Anholt
commit 7622b25543665567d8830a63210385b7d705924b upstream. The underscores variant frees the pointers inside, while the no-underscores variant calls underscores and then frees the struct. Signed-off-by: Eric Anholt <eric@anholt.net> Fixes: d8dbf44f13b9 ("drm/vc4: Make the CRTCs cooperate on allocating display lists.") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01drm/i915: Ignore bogus plane coordinates on SKL when the plane is not visibleVille Syrjälä
commit 3bfdfdcbce2796ce75bf2d85fd8471858d702e5d upstream. When the plane is invisible we may have all sorts of bogus stuff in the coordinates, which we must ignore or else we might fail the plane update. This started to happen on SKL when I moved the plane offset computation to happen in the check phase. Previously we happily ignored it all since we never called the update_plane hook with an invisible plane. Cc: Sivakumar Thulasimani <sivakumar.thulasimani@intel.com> Cc: drm-intel-fixes@lists.freedesktop.org Fixes: b63a16f6cd89 ("drm/i915: Compute display surface offset in the plane check hook for SKL+") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98258 Testcase: igt/pm_rpm/legacy-planes Testcase: igt/pm_rpm/universal-planes Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1478550057-24864-3-git-send-email-ville.syrjala@linux.intel.com (cherry picked from commit a5e4c7d0aa6784d8abe95c3ceef0da9656d17468) Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01drm: Fix broken VT switch with video=1366x768 optionTakashi Iwai
commit fdf35a6b22247746a7053fc764d04218a9306f82 upstream. I noticed that the VT switch doesn't work any longer with a Dell laptop with 1366x768 eDP when the machine is connected with a DP monitor. It behaves as if VT were switched, but the graphics remain frozen. Actually the keyboard works, so I could switch back to VT7 again. I tried to track down the problem, and encountered a long story until we reach to this error: - The machine is booted with video=1366x768 option (the distro installer seems to add it as default). - Recently, drm_helper_probe_single_connector_modes() deals with cmdline modes, and it tries to create a new mode when no matching mode is found. - The drm_mode_create_from_cmdline_mode() creates a mode based on either CVT of GFT according to the given cmdline mode; in our case, it's 1366x768. - Since both CVT and GFT can't express the width 1366 due to alignment, the resultant mode becomes 1368x768, slightly larger than the given size. - Later on, the atomic commit is performed, and in drm_atomic_check_only(), the size of each plane is checked. - The size check of 1366x768 fails due to the above, and eventually the whole VT switch fails. Back in the history, we've had a manual fix-up of 1368x768 in various places via c09dedb7a50e ("drm/edid: Add a workaround for 1366x768 HD panel"), but they have been all in drm_edid.c at probing the modes from EDID. For addressing the problem above, we need a similar hack to the mode newly created from cmdline, manually adjusting the width when the expected size is 1366 while we get 1368 instead. Fixes: eaf99c749d43 ("drm: Perform cmdline mode parsing during...") Signed-off-by: Takashi Iwai <tiwai@suse.de> Link: http://patchwork.freedesktop.org/patch/msgid/20170109145614.29454-1-tiwai@suse.de Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01drm: Schedule the output_poll_work with 1s delay if we have delayed eventPeter Ujfalusi
commit 68f458eec7069d618a6c884ca007426e0cea411b upstream. Instead of scheduling the work to handle the initial delayed event, use 1s delay. This delay should not be needed, but Optimus/nouveau will fail in a mysterious way if the delayed event is handled as soon as possible like it is done in drm_helper_probe_single_connector_modes() in case the poll was enabled before. Reverting 339fd36238dd would give back the 10 sec (!) delay to handle the delayed event. Adding 1sec delay to the poll_work is enough to work around the issue in Optimus setups and gives shorter response on handling the initial delayed event. Fixes: 339fd36238dd ("drm: drm_probe_helper: Fix output_poll_work scheduling") Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> [danvet: Add FIXME to the comment to make it stick out more.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20170109143158.21917-1-peter.ujfalusi@ti.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01tile/ptrace: Preserve previous registers for short regset writeDave Martin
commit fd7c99142d77dc4a851879a66715abf12a3193fb upstream. Ensure that if userspace supplies insufficient data to PTRACE_SETREGSET to fill all the registers, the thread's old registers are preserved. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01fbdev: color map copying bounds checkingKees Cook
commit 2dc705a9930b4806250fbf5a76e55266e59389f2 upstream. Copying color maps to userspace doesn't check the value of to->start, which will cause kernel heap buffer OOB read due to signedness wraps. CVE-2016-8405 Link: http://lkml.kernel.org/r/20170105224249.GA50925@beast Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kees Cook <keescook@chromium.org> Reported-by: Peter Pi (@heisecode) of Trend Micro Cc: Min Chong <mchong@google.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-26Linux 4.9.6Greg Kroah-Hartman
2017-01-26libceph: stop allocating a new cipher on every crypto requestIlya Dryomov
commit 7af3ea189a9a13f090de51c97f676215dabc1205 upstream. This is useless and more importantly not allowed on the writeback path, because crypto_alloc_skcipher() allocates memory with GFP_KERNEL, which can recurse back into the filesystem: kworker/9:3 D ffff92303f318180 0 20732 2 0x00000080 Workqueue: ceph-msgr ceph_con_workfn [libceph] ffff923035dd4480 ffff923038f8a0c0 0000000000000001 000000009eb27318 ffff92269eb28000 ffff92269eb27338 ffff923036b145ac ffff923035dd4480 00000000ffffffff ffff923036b145b0 ffffffff951eb4e1 ffff923036b145a8 Call Trace: [<ffffffff951eb4e1>] ? schedule+0x31/0x80 [<ffffffff951eb77a>] ? schedule_preempt_disabled+0xa/0x10 [<ffffffff951ed1f4>] ? __mutex_lock_slowpath+0xb4/0x130 [<ffffffff951ed28b>] ? mutex_lock+0x1b/0x30 [<ffffffffc0a974b3>] ? xfs_reclaim_inodes_ag+0x233/0x2d0 [xfs] [<ffffffff94d92ba5>] ? move_active_pages_to_lru+0x125/0x270 [<ffffffff94f2b985>] ? radix_tree_gang_lookup_tag+0xc5/0x1c0 [<ffffffff94dad0f3>] ? __list_lru_walk_one.isra.3+0x33/0x120 [<ffffffffc0a98331>] ? xfs_reclaim_inodes_nr+0x31/0x40 [xfs] [<ffffffff94e05bfe>] ? super_cache_scan+0x17e/0x190 [<ffffffff94d919f3>] ? shrink_slab.part.38+0x1e3/0x3d0 [<ffffffff94d9616a>] ? shrink_node+0x10a/0x320 [<ffffffff94d96474>] ? do_try_to_free_pages+0xf4/0x350 [<ffffffff94d967ba>] ? try_to_free_pages+0xea/0x1b0 [<ffffffff94d863bd>] ? __alloc_pages_nodemask+0x61d/0xe60 [<ffffffff94ddf42d>] ? cache_grow_begin+0x9d/0x560 [<ffffffff94ddfb88>] ? fallback_alloc+0x148/0x1c0 [<ffffffff94ed84e7>] ? __crypto_alloc_tfm+0x37/0x130 [<ffffffff94de09db>] ? __kmalloc+0x1eb/0x580 [<ffffffffc09fe2db>] ? crush_choose_firstn+0x3eb/0x470 [libceph] [<ffffffff94ed84e7>] ? __crypto_alloc_tfm+0x37/0x130 [<ffffffff94ed9c19>] ? crypto_spawn_tfm+0x39/0x60 [<ffffffffc08b30a3>] ? crypto_cbc_init_tfm+0x23/0x40 [cbc] [<ffffffff94ed857c>] ? __crypto_alloc_tfm+0xcc/0x130 [<ffffffff94edcc23>] ? crypto_skcipher_init_tfm+0x113/0x180 [<ffffffff94ed7cc3>] ? crypto_create_tfm+0x43/0xb0 [<ffffffff94ed83b0>] ? crypto_larval_lookup+0x150/0x150 [<ffffffff94ed7da2>] ? crypto_alloc_tfm+0x72/0x120 [<ffffffffc0a01dd7>] ? ceph_aes_encrypt2+0x67/0x400 [libceph] [<ffffffffc09fd264>] ? ceph_pg_to_up_acting_osds+0x84/0x5b0 [libceph] [<ffffffff950d40a0>] ? release_sock+0x40/0x90 [<ffffffff95139f94>] ? tcp_recvmsg+0x4b4/0xae0 [<ffffffffc0a02714>] ? ceph_encrypt2+0x54/0xc0 [libceph] [<ffffffffc0a02b4d>] ? ceph_x_encrypt+0x5d/0x90 [libceph] [<ffffffffc0a02bdf>] ? calcu_signature+0x5f/0x90 [libceph] [<ffffffffc0a02ef5>] ? ceph_x_sign_message+0x35/0x50 [libceph] [<ffffffffc09e948c>] ? prepare_write_message_footer+0x5c/0xa0 [libceph] [<ffffffffc09ecd18>] ? ceph_con_workfn+0x2258/0x2dd0 [libceph] [<ffffffffc09e9903>] ? queue_con_delay+0x33/0xd0 [libceph] [<ffffffffc09f68ed>] ? __submit_request+0x20d/0x2f0 [libceph] [<ffffffffc09f6ef8>] ? ceph_osdc_start_request+0x28/0x30 [libceph] [<ffffffffc0b52603>] ? rbd_queue_workfn+0x2f3/0x350 [rbd] [<ffffffff94c94ec0>] ? process_one_work+0x160/0x410 [<ffffffff94c951bd>] ? worker_thread+0x4d/0x480 [<ffffffff94c95170>] ? process_one_work+0x410/0x410 [<ffffffff94c9af8d>] ? kthread+0xcd/0xf0 [<ffffffff951efb2f>] ? ret_from_fork+0x1f/0x40 [<ffffffff94c9aec0>] ? kthread_create_on_node+0x190/0x190 Allocating the cipher along with the key fixes the issue - as long the key doesn't change, a single cipher context can be used concurrently in multiple requests. We still can't take that GFP_KERNEL allocation though. Both ceph_crypto_key_clone() and ceph_crypto_key_decode() are called from GFP_NOFS context, so resort to memalloc_noio_{save,restore}() here. Reported-by: Lucas Stach <l.stach@pengutronix.de> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-26libceph: uninline ceph_crypto_key_destroy()Ilya Dryomov
commit 6db2304aabb070261ad34923bfd83c43dfb000e3 upstream. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-26tools/virtio/ringtest: fix run-on-all.sh for offline cpusHalil Pasic
commit 21f5eda9b8671744539c8295b9df62991fffb2ce upstream. Since ef1b144d ("tools/virtio/ringtest: fix run-on-all.sh to work without /dev/cpu") run-on-all.sh uses seq 0 $HOST_AFFINITY as the list of ids of the CPUs to run the command on (assuming ids of online CPUs are consecutive and start from 0), where $HOST_AFFINITY is the highest CPU id in the system previously determined using lscpu. This can fail on systems with offline CPUs. Instead let's use lscpu to determine the list of online CPUs. Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com> Fixes: ef1b144d ("tools/virtio/ringtest: fix run-on-all.sh to work without /dev/cpu") Reviewed-by: Sascha Silbe <silbe@linux.vnet.ibm.com> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-26selftest/powerpc: Wrong PMC initialized in pmc56_overflow testMadhavan Srinivasan
commit df21d2fa733035e4d414379960f94b2516b41296 upstream. Test uses PMC2 to count the event. But PMC1 is being initialized. Patch to fix it. Fixes: 3752e453f6ba ('selftests/powerpc: Add tests of PMU EBBs') Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>