summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-10-05powerpc/tm: Flush TM only if CPU has TM featureGustavo Romero
commit c1fa0768a8713b135848f78fd43ffc208d8ded70 upstream. Commit cd63f3c ("powerpc/tm: Fix saving of TM SPRs in core dump") added code to access TM SPRs in flush_tmregs_to_thread(). However flush_tmregs_to_thread() does not check if TM feature is available on CPU before trying to access TM SPRs in order to copy live state to thread structures. flush_tmregs_to_thread() is indeed guarded by CONFIG_PPC_TRANSACTIONAL_MEM but it might be the case that kernel was compiled with CONFIG_PPC_TRANSACTIONAL_MEM enabled and ran on a CPU without TM feature available, thus rendering the execution of TM instructions that are treated by the CPU as illegal instructions. The fix is just to add proper checking in flush_tmregs_to_thread() if CPU has the TM feature before accessing any TM-specific resource, returning immediately if TM is no available on the CPU. Adding that checking in flush_tmregs_to_thread() instead of in places where it is called, like in vsr_get() and vsr_set(), is better because avoids the same problem cropping up elsewhere. Fixes: cd63f3c ("powerpc/tm: Fix saving of TM SPRs in core dump") Signed-off-by: Gustavo Romero <gromero@linux.vnet.ibm.com> Reviewed-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05powerpc/pseries: Fix parent_dn reference leak in add_dt_node()Tyrel Datwyler
commit b537ca6fede69a281dc524983e5e633d79a10a08 upstream. A reference to the parent device node is held by add_dt_node() for the node to be added. If the call to dlpar_configure_connector() fails add_dt_node() returns ENOENT and that reference is not freed. Add a call to of_node_put(parent_dn) prior to bailing out after a failed dlpar_configure_connector() call. Fixes: 8d5ff320766f ("powerpc/pseries: Make dlpar_configure_connector parent node aware") Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05KEYS: prevent KEYCTL_READ on negative keyEric Biggers
commit 37863c43b2c6464f252862bf2e9768264e961678 upstream. Because keyctl_read_key() looks up the key with no permissions requested, it may find a negatively instantiated key. If the key is also possessed, we went ahead and called ->read() on the key. But the key payload will actually contain the ->reject_error rather than the normal payload. Thus, the kernel oopses trying to read the user_key_payload from memory address (int)-ENOKEY = 0x00000000ffffff82. Fortunately the payload data is stored inline, so it shouldn't be possible to abuse this as an arbitrary memory read primitive... Reproducer: keyctl new_session keyctl request2 user desc '' @s keyctl read $(keyctl show | awk '/user: desc/ {print $1}') It causes a crash like the following: BUG: unable to handle kernel paging request at 00000000ffffff92 IP: user_read+0x33/0xa0 PGD 36a54067 P4D 36a54067 PUD 0 Oops: 0000 [#1] SMP CPU: 0 PID: 211 Comm: keyctl Not tainted 4.14.0-rc1 #337 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-20170228_101828-anatol 04/01/2014 task: ffff90aa3b74c3c0 task.stack: ffff9878c0478000 RIP: 0010:user_read+0x33/0xa0 RSP: 0018:ffff9878c047bee8 EFLAGS: 00010246 RAX: 0000000000000001 RBX: ffff90aa3d7da340 RCX: 0000000000000017 RDX: 0000000000000000 RSI: 00000000ffffff82 RDI: ffff90aa3d7da340 RBP: ffff9878c047bf00 R08: 00000024f95da94f R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f58ece69740(0000) GS:ffff90aa3e200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000ffffff92 CR3: 0000000036adc001 CR4: 00000000003606f0 Call Trace: keyctl_read_key+0xac/0xe0 SyS_keyctl+0x99/0x120 entry_SYSCALL_64_fastpath+0x1f/0xbe RIP: 0033:0x7f58ec787bb9 RSP: 002b:00007ffc8d401678 EFLAGS: 00000206 ORIG_RAX: 00000000000000fa RAX: ffffffffffffffda RBX: 00007ffc8d402800 RCX: 00007f58ec787bb9 RDX: 0000000000000000 RSI: 00000000174a63ac RDI: 000000000000000b RBP: 0000000000000004 R08: 00007ffc8d402809 R09: 0000000000000020 R10: 0000000000000000 R11: 0000000000000206 R12: 00007ffc8d402800 R13: 00007ffc8d4016e0 R14: 0000000000000000 R15: 0000000000000000 Code: e5 41 55 49 89 f5 41 54 49 89 d4 53 48 89 fb e8 a4 b4 ad ff 85 c0 74 09 80 3d b9 4c 96 00 00 74 43 48 8b b3 20 01 00 00 4d 85 ed <0f> b7 5e 10 74 29 4d 85 e4 74 24 4c 39 e3 4c 89 e2 4c 89 ef 48 RIP: user_read+0x33/0xa0 RSP: ffff9878c047bee8 CR2: 00000000ffffff92 Fixes: 61ea0c0ba904 ("KEYS: Skip key state checks when checking for possession") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05KEYS: prevent creating a different user's keyringsEric Biggers
commit 237bbd29f7a049d310d907f4b2716a7feef9abf3 upstream. It was possible for an unprivileged user to create the user and user session keyrings for another user. For example: sudo -u '#3000' sh -c 'keyctl add keyring _uid.4000 "" @u keyctl add keyring _uid_ses.4000 "" @u sleep 15' & sleep 1 sudo -u '#4000' keyctl describe @u sudo -u '#4000' keyctl describe @us This is problematic because these "fake" keyrings won't have the right permissions. In particular, the user who created them first will own them and will have full access to them via the possessor permissions, which can be used to compromise the security of a user's keys: -4: alswrv-----v------------ 3000 0 keyring: _uid.4000 -5: alswrv-----v------------ 3000 0 keyring: _uid_ses.4000 Fix it by marking user and user session keyrings with a flag KEY_FLAG_UID_KEYRING. Then, when searching for a user or user session keyring by name, skip all keyrings that don't have the flag set. Fixes: 69664cf16af4 ("keys: don't generate user and user session keyrings unless they're accessed") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05KEYS: fix writing past end of user-supplied buffer in keyring_read()Eric Biggers
commit e645016abc803dafc75e4b8f6e4118f088900ffb upstream. Userspace can call keyctl_read() on a keyring to get the list of IDs of keys in the keyring. But if the user-supplied buffer is too small, the kernel would write the full list anyway --- which will corrupt whatever userspace memory happened to be past the end of the buffer. Fix it by only filling the space that is available. Fixes: b2a4df200d57 ("KEYS: Expand the capacity of a keyring") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05security/keys: rewrite all of big_key cryptoJason A. Donenfeld
commit 428490e38b2e352812e0b765d8bceafab0ec441d upstream. This started out as just replacing the use of crypto/rng with get_random_bytes_wait, so that we wouldn't use bad randomness at boot time. But, upon looking further, it appears that there were even deeper underlying cryptographic problems, and that this seems to have been committed with very little crypto review. So, I rewrote the whole thing, trying to keep to the conventions introduced by the previous author, to fix these cryptographic flaws. It makes no sense to seed crypto/rng at boot time and then keep using it like this, when in fact there's already get_random_bytes_wait, which can ensure there's enough entropy and be a much more standard way of generating keys. Since this sensitive material is being stored untrusted, using ECB and no authentication is simply not okay at all. I find it surprising and a bit horrifying that this code even made it past basic crypto review, which perhaps points to some larger issues. This patch moves from using AES-ECB to using AES-GCM. Since keys are uniquely generated each time, we can set the nonce to zero. There was also a race condition in which the same key would be reused at the same time in different threads. A mutex fixes this issue now. So, to summarize, this commit fixes the following vulnerabilities: * Low entropy key generation, allowing an attacker to potentially guess or predict keys. * Unauthenticated encryption, allowing an attacker to modify the cipher text in particular ways in order to manipulate the plaintext, which is is even more frightening considering the next point. * Use of ECB mode, allowing an attacker to trivially swap blocks or compare identical plaintext blocks. * Key re-use. * Faulty memory zeroing. [Note that in backporting this commit to 4.9, get_random_bytes_wait was replaced with get_random_bytes, since 4.9 does not have the former function. This might result in slightly worse entropy in key generation, but common use cases of big_keys makes that likely not a huge deal. And, this is the best we can do with this old kernel. Alas.] Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Eric Biggers <ebiggers3@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Kirill Marinushkin <k.marinushkin@gmail.com> Cc: security@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05security/keys: properly zero out sensitive key material in big_keyJason A. Donenfeld
commit 910801809b2e40a4baedd080ef5d80b4a180e70e upstream. Error paths forgot to zero out sensitive material, so this patch changes some kfrees into a kzfrees. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Eric Biggers <ebiggers3@gmail.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Kirill Marinushkin <k.marinushkin@gmail.com> Cc: security@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05crypto: talitos - fix hashingLEROY Christophe
commit 886a27c0fc8a34633aadb0986dba11d8c150ae2e upstream. md5sum on some files gives wrong result Exemple: With the md5sum from libkcapi: c15115c05bad51113f81bdaee735dd09 test With the original md5sum: bbdf41d80ba7e8b2b7be3a0772be76cb test This patch fixes this issue Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05crypto: talitos - fix sha224LEROY Christophe
commit afd62fa26343be6445479e75de9f07092a061459 upstream. Kernel crypto tests report the following error at startup [ 2.752626] alg: hash: Test 4 failed for sha224-talitos [ 2.757907] 00000000: 30 e2 86 e2 e7 8a dd 0d d7 eb 9f d5 83 fe f1 b0 00000010: 2d 5a 6c a5 f9 55 ea fd 0e 72 05 22 This patch fixes it Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05crypto: talitos - Don't provide setkey for non hmac hashing algs.LEROY Christophe
commit 56136631573baa537a15e0012055ffe8cfec1a33 upstream. Today, md5sum fails with error -ENOKEY because a setkey function is set for non hmac hashing algs, see strace output below: mmap(NULL, 378880, PROT_READ, MAP_SHARED, 6, 0) = 0x77f50000 accept(3, 0, NULL) = 7 vmsplice(5, [{"bin/\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 378880}], 1, SPLICE_F_MORE|SPLICE_F_GIFT) = 262144 splice(4, NULL, 7, NULL, 262144, SPLICE_F_MORE) = -1 ENOKEY (Required key not available) write(2, "Generation of hash for file kcap"..., 50) = 50 munmap(0x77f50000, 378880) = 0 This patch ensures that setkey() function is set only for hmac hashing. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05crypto: drbg - fix freeing of resourcesStephan Mueller
commit bd6227a150fdb56e7bb734976ef6e53a2c1cb334 upstream. During the change to use aligned buffers, the deallocation code path was not updated correctly. The current code tries to free the aligned buffer pointer and not the original buffer pointer as it is supposed to. Thus, the code is updated to free the original buffer pointer and set the aligned buffer pointer that is used throughout the code to NULL. Fixes: 3cfc3b9721123 ("crypto: drbg - use aligned buffers") CC: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05drm/radeon: disable hard reset in hibernate for APUsAlex Deucher
commit 820608548737e315c6f93e3099b4e65bde062334 upstream. Fixes a hibernation regression on APUs. Bug: https://bugzilla.kernel.org/show_bug.cgi?id=191571 Fixes: 274ad65c9d02bdc (drm/radeon: hard reset r600 and newer GPU when hibernating.) Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05scsi: scsi_transport_iscsi: fix the issue that iscsi_if_rx doesn't parse ↵Xin Long
nlmsg properly commit c88f0e6b06f4092995688211a631bb436125d77b upstream. ChunYu found a kernel crash by syzkaller: [ 651.617875] kasan: CONFIG_KASAN_INLINE enabled [ 651.618217] kasan: GPF could be caused by NULL-ptr deref or user memory access [ 651.618731] general protection fault: 0000 [#1] SMP KASAN [ 651.621543] CPU: 1 PID: 9539 Comm: scsi Not tainted 4.11.0.cov #32 [ 651.621938] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 651.622309] task: ffff880117780000 task.stack: ffff8800a3188000 [ 651.622762] RIP: 0010:skb_release_data+0x26c/0x590 [...] [ 651.627260] Call Trace: [ 651.629156] skb_release_all+0x4f/0x60 [ 651.629450] consume_skb+0x1a5/0x600 [ 651.630705] netlink_unicast+0x505/0x720 [ 651.632345] netlink_sendmsg+0xab2/0xe70 [ 651.633704] sock_sendmsg+0xcf/0x110 [ 651.633942] ___sys_sendmsg+0x833/0x980 [ 651.637117] __sys_sendmsg+0xf3/0x240 [ 651.638820] SyS_sendmsg+0x32/0x50 [ 651.639048] entry_SYSCALL_64_fastpath+0x1f/0xc2 It's caused by skb_shared_info at the end of sk_buff was overwritten by ISCSI_KEVENT_IF_ERROR when parsing nlmsg info from skb in iscsi_if_rx. During the loop if skb->len == nlh->nlmsg_len and both are sizeof(*nlh), ev = nlmsg_data(nlh) will acutally get skb_shinfo(SKB) instead and set a new value to skb_shinfo(SKB)->nr_frags by ev->type. This patch is to fix it by checking nlh->nlmsg_len properly there to avoid over accessing sk_buff. Reported-by: ChunYu Wang <chunwang@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Chris Leech <cleech@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05md/raid5: preserve STRIPE_ON_UNPLUG_LIST in break_stripe_batch_listDennis Yang
commit 184a09eb9a2fe425e49c9538f1604b05ed33cfef upstream. In release_stripe_plug(), if a stripe_head has its STRIPE_ON_UNPLUG_LIST set, it indicates that this stripe_head is already in the raid5_plug_cb list and release_stripe() would be called instead to drop a reference count. Otherwise, the STRIPE_ON_UNPLUG_LIST bit would be set for this stripe_head and it will get queued into the raid5_plug_cb list. Since break_stripe_batch_list() did not preserve STRIPE_ON_UNPLUG_LIST, A stripe could be re-added to plug list while it is still on that list in the following situation. If stripe_head A is added to another stripe_head B's batch list, in this case A will have its batch_head != NULL and be added into the plug list. After that, stripe_head B gets handled and called break_stripe_batch_list() to reset all the batched stripe_head(including A which is still on the plug list)'s state and reset their batch_head to NULL. Before the plug list gets processed, if there is another write request comes in and get stripe_head A, A will have its batch_head == NULL (cleared by calling break_stripe_batch_list() on B) and be added to plug list once again. Signed-off-by: Dennis Yang <dennisyang@qnap.com> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05md/raid5: fix a race condition in stripe batchShaohua Li
commit 3664847d95e60a9a943858b7800f8484669740fc upstream. We have a race condition in below scenario, say have 3 continuous stripes, sh1, sh2 and sh3, sh1 is the stripe_head of sh2 and sh3: CPU1 CPU2 CPU3 handle_stripe(sh3) stripe_add_to_batch_list(sh3) -> lock(sh2, sh3) -> lock batch_lock(sh1) -> add sh3 to batch_list of sh1 -> unlock batch_lock(sh1) clear_batch_ready(sh1) -> lock(sh1) and batch_lock(sh1) -> clear STRIPE_BATCH_READY for all stripes in batch_list -> unlock(sh1) and batch_lock(sh1) ->clear_batch_ready(sh3) -->test_and_clear_bit(STRIPE_BATCH_READY, sh3) --->return 0 as sh->batch == NULL -> sh3->batch_head = sh1 -> unlock (sh2, sh3) In CPU1, handle_stripe will continue handle sh3 even it's in batch stripe list of sh1. By moving sh3->batch_head assignment in to batch_lock, we make it impossible to clear STRIPE_BATCH_READY before batch_head is set. Thanks Stephane for helping debug this tricky issue. Reported-and-tested-by: Stephane Thiell <sthiell@stanford.edu> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05tracing: Erase irqsoff trace with empty writeBo Yan
commit 8dd33bcb7050dd6f8c1432732f930932c9d3a33e upstream. One convenient way to erase trace is "echo > trace". However, this is currently broken if the current tracer is irqsoff tracer. This is because irqsoff tracer use max_buffer as the default trace buffer. Set the max_buffer as the one to be cleared when it's the trace buffer currently in use. Link: http://lkml.kernel.org/r/1505754215-29411-1-git-send-email-byan@nvidia.com Cc: <mingo@redhat.com> Fixes: 4acd4d00f ("tracing: give easy way to clear trace buffer") Signed-off-by: Bo Yan <byan@nvidia.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05tracing: Fix trace_pipe behavior for instance tracesTahsin Erdogan
commit 75df6e688ccd517e339a7c422ef7ad73045b18a2 upstream. When reading data from trace_pipe, tracing_wait_pipe() performs a check to see if tracing has been turned off after some data was read. Currently, this check always looks at global trace state, but it should be checking the trace instance where trace_pipe is located at. Because of this bug, cat instances/i1/trace_pipe in the following script will immediately exit instead of waiting for data: cd /sys/kernel/debug/tracing echo 0 > tracing_on mkdir -p instances/i1 echo 1 > instances/i1/tracing_on echo 1 > instances/i1/events/sched/sched_process_exec/enable cat instances/i1/trace_pipe Link: http://lkml.kernel.org/r/20170917102348.1615-1-tahsin@google.com Fixes: 10246fa35d4f ("tracing: give easy way to clear trace buffer") Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05KVM: PPC: Book3S HV: Protect updates to spapr_tce_tables listPaul Mackerras
commit edd03602d97236e8fea13cd76886c576186aa307 upstream. Al Viro pointed out that while one thread of a process is executing in kvm_vm_ioctl_create_spapr_tce(), another thread could guess the file descriptor returned by anon_inode_getfd() and close() it before the first thread has added it to the kvm->arch.spapr_tce_tables list. That highlights a more general problem: there is no mutual exclusion between writers to the spapr_tce_tables list, leading to the possibility of the list becoming corrupted, which could cause a host kernel crash. To fix the mutual exclusion problem, we add a mutex_lock/unlock pair around the list_del_rce in kvm_spapr_tce_release(). If another thread does guess the file descriptor returned by the anon_inode_getfd() call in kvm_vm_ioctl_create_spapr_tce() and closes it, its call to kvm_spapr_tce_release() will not do any harm because it will have to wait until the first thread has released kvm->lock. The other things that the second thread could do with the guessed file descriptor are to mmap it or to pass it as a parameter to a KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE ioctl on a KVM device fd. An mmap call won't cause any harm because kvm_spapr_tce_mmap() and kvm_spapr_tce_fault() don't access the spapr_tce_tables list or the kvmppc_spapr_tce_table.list field, and the fields that they do use have been properly initialized by the time of the anon_inode_getfd() call. The KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE ioctl calls kvm_spapr_tce_attach_iommu_group(), which scans the spapr_tce_tables list looking for the kvmppc_spapr_tce_table struct corresponding to the fd given as the parameter. Either it will find the new entry or it won't; if it doesn't, it just returns an error, and if it does, it will function normally. So, in each case there is no harmful effect. [paulus@ozlabs.org - moved parts of the upstream patch into the backport of 47c5310a8dbe, adjusted this commit message accordingly.] Fixes: 366baf28ee3f ("KVM: PPC: Use RCU for arch.spapr_tce_tables") Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05KVM: PPC: Book3S: Fix race and leak in kvm_vm_ioctl_create_spapr_tce()Paul Mackerras
commit 47c5310a8dbe7c2cb9f0083daa43ceed76c257fa upstream, with part of commit edd03602d97236e8fea13cd76886c576186aa307 folded in. Nixiaoming pointed out that there is a memory leak in kvm_vm_ioctl_create_spapr_tce() if the call to anon_inode_getfd() fails; the memory allocated for the kvmppc_spapr_tce_table struct is not freed, and nor are the pages allocated for the iommu tables. In addition, we have already incremented the process's count of locked memory pages, and this doesn't get restored on error. David Hildenbrand pointed out that there is a race in that the function checks early on that there is not already an entry in the stt->iommu_tables list with the same LIOBN, but an entry with the same LIOBN could get added between then and when the new entry is added to the list. This fixes all three problems. To simplify things, we now call anon_inode_getfd() before placing the new entry in the list. The check for an existing entry is done while holding the kvm->lock mutex, immediately before adding the new entry to the list. Finally, on failure we now call kvmppc_account_memlimit to decrement the process's count of locked memory pages. [paulus@ozlabs.org - folded in that part of edd03602d972 ("KVM: PPC: Book3S HV: Protect updates to spapr_tce_tables list", 2017-08-28) which restructured the code that 47c5310a8dbe modified, to avoid a build failure caused by the absence of put_unused_fd().] Fixes: 54738c097163 ("KVM: PPC: Accelerate H_PUT_TCE by implementing it in real mode") Fixes: f8626985c7c2 ("KVM: PPC: Account TCE-containing pages in locked_vm") Reported-by: Nixiaoming <nixiaoming@huawei.com> Reported-by: David Hildenbrand <david@redhat.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05genirq: Make sparse_irq_lock protect what it should protectThomas Gleixner
commit 12ac1d0f6c3e95732d144ffa65c8b20fbd9aa462 upstream. for_each_active_irq() iterates the sparse irq allocation bitmap. The caller must hold sparse_irq_lock. Several code pathes expect that an active bit in the sparse bitmap also has a valid interrupt descriptor. Unfortunately that's not true. The (de)allocation is a two step process, which holds the sparse_irq_lock only across the queue/remove from the radix tree and the set/clear in the allocation bitmap. If a iteration locks sparse_irq_lock between the two steps, then it might see an active bit but the corresponding irq descriptor is NULL. If that is dereferenced unconditionally, then the kernel oopses. Of course, all iterator sites could be audited and fixed, but.... There is no reason why the sparse_irq_lock needs to be dropped between the two steps, in fact the code becomes simpler when the mutex is held across both and the semantics become more straight forward, so future problems of missing NULL pointer checks in the iteration are avoided and all existing sites are fixed in one go. Expand the lock held sections so both operations are covered and the bitmap and the radixtree are in sync. Fixes: a05a900a51c7 ("genirq: Make sparse_lock a mutex") Reported-and-tested-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05mac80211: flush hw_roc_start work before cancelling the ROCAvraham Stern
commit 6e46d8ce894374fc135c96a8d1057c6af1fef237 upstream. When HW ROC is supported it is possible that after the HW notified that the ROC has started, the ROC was cancelled and another ROC was added while the hw_roc_start worker is waiting on the mutex (since cancelling the ROC and adding another one also holds the same mutex). As a result, the hw_roc_start worker will continue to run after the new ROC is added but before it is actually started by the HW. This may result in notifying userspace that the ROC has started before it actually does, or in case of management tx ROC, in an attempt to tx while not on the right channel. In addition, when the driver will notify mac80211 that the second ROC has started, mac80211 will warn that this ROC has already been notified. Fix this by flushing the hw_roc_start work before cancelling an ROC. Cc: stable@vger.kernel.org Signed-off-by: Avraham Stern <avraham.stern@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05mac80211_hwsim: Use proper TX powerBeni Lev
commit 9de981f507474f326e42117858dc9a9321331ae5 upstream. In struct ieee80211_tx_info, control.vif pointer and rate_driver_data[0] falls on the same place, depending on the union usage. During the whole TX process, the union is referred to as a control struct, which holds the vif that is later used in the tx flow, especially in order to derive the used tx power. Referring direcly to rate_driver_data[0] and assigning a value to it, overwrites the vif pointer, hence making all later references irrelevant. Moreover, rate_driver_data[0] isn't used later in the flow in order to retrieve the channel that it is pointing to. Signed-off-by: Beni Lev <beni.lev@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05mac80211: fix VLAN handling with TXQsJohannes Berg
commit 53168215909281a09d3afc6fb51a9d4f81f74d39 upstream. With TXQs, the AP_VLAN interfaces are resolved to their owner AP interface when enqueuing the frame, which makes sense since the frame really goes out on that as far as the driver is concerned. However, this introduces a problem: frames to be encrypted with a VLAN-specific GTK will now be encrypted with the AP GTK, since the information about which virtual interface to use to select the key is taken from the TXQ. Fix this by preserving info->control.vif and using that in the dequeue function. This now requires doing the driver-mapping in the dequeue as well. Since there's no way to filter the frames that are sitting on a TXQ, drop all frames, which may affect other interfaces, when an AP_VLAN is removed. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05fs/proc: Report eip/esp in /prod/PID/stat for coredumpingJohn Ogness
commit fd7d56270b526ca3ed0c224362e3c64a0f86687a upstream. Commit 0a1eb2d474ed ("fs/proc: Stop reporting eip and esp in /proc/PID/stat") stopped reporting eip/esp because it is racy and dangerous for executing tasks. The comment adds: As far as I know, there are no use programs that make any material use of these fields, so just get rid of them. However, existing userspace core-dump-handler applications (for example, minicoredumper) are using these fields since they provide an excellent cross-platform interface to these valuable pointers. So that commit introduced a user space visible regression. Partially revert the change and make the readout possible for tasks with the proper permissions and only if the target task has the PF_DUMPCORE flag set. Fixes: 0a1eb2d474ed ("fs/proc: Stop reporting eip and esp in> /proc/PID/stat") Reported-by: Marco Felsch <marco.felsch@preh.de> Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: Tycho Andersen <tycho.andersen@canonical.com> Cc: Kees Cook <keescook@chromium.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: Borislav Petkov <bp@alien8.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Linux API <linux-api@vger.kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/87poatfwg6.fsf@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05cifs: release auth_key.response for reconnect.Shu Wang
commit f5c4ba816315d3b813af16f5571f86c8d4e897bd upstream. There is a race that cause cifs reconnect in cifs_mount, - cifs_mount - cifs_get_tcp_session - [ start thread cifs_demultiplex_thread - cifs_read_from_socket: -ECONNABORTED - DELAY_WORK smb2_reconnect_server ] - cifs_setup_session - [ smb2_reconnect_server ] auth_key.response was allocated in cifs_setup_session, and will release when the session destoried. So when session re- connect, auth_key.response should be check and released. Tested with my system: CIFS VFS: Free previous auth_key.response = ffff8800320bbf80 A simple auth_key.response allocation call trace: - cifs_setup_session - SMB2_sess_setup - SMB2_sess_auth_rawntlmssp_authenticate - build_ntlmssp_auth_blob - setup_ntlmv2_rsp Signed-off-by: Shu Wang <shuwang@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05cifs: release cifs root_cred after exit_cifsShu Wang
commit 94183331e815617246b1baa97e0916f358c794bb upstream. memory leak was found by kmemleak. exit_cifs_spnego should be called before cifs module removed, or cifs root_cred will not be released. kmemleak report: unreferenced object 0xffff880070a3ce40 (size 192): backtrace: kmemleak_alloc+0x4a/0xa0 kmem_cache_alloc+0xc7/0x1d0 prepare_kernel_cred+0x20/0x120 init_cifs_spnego+0x2d/0x170 [cifs] 0xffffffffc07801f3 do_one_initcall+0x51/0x1b0 do_init_module+0x60/0x1fd load_module+0x161e/0x1b60 SYSC_finit_module+0xa9/0x100 SyS_finit_module+0xe/0x10 Signed-off-by: Shu Wang <shuwang@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27Linux 4.9.52Greg Kroah-Hartman
2017-09-27bcache: fix bch_hprint crash and improve outputMichael Lyle
commit 9276717b9e297a62d1151a43d1cd286213f68eb7 upstream. Most importantly, solve a crash where %llu was used to format signed numbers. This would cause a buffer overflow when reading sysfs writeback_rate_debug, as only 20 bytes were allocated for this and %llu writes 20 characters plus a null. Always use the units mechanism rather than having different output paths for simplicity. Also, correct problems with display output where 1.10 was a larger number than 1.09, by multiplying by 10 and then dividing by 1024 instead of dividing by 100. (Remainders of >= 1000 would print as .10). Minor changes: Always display the decimal point instead of trying to omit it based on number of digits shown. Decide what units to use based on 1000 as a threshold, not 1024 (in other words, always print at most 3 digits before the decimal point). Signed-off-by: Michael Lyle <mlyle@lyle.org> Reported-by: Dmitry Yu Okunev <dyokunev@ut.mephi.ru> Acked-by: Kent Overstreet <kent.overstreet@gmail.com> Reviewed-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27bcache: fix for gc and write-back raceTang Junhui
commit 9baf30972b5568d8b5bc8b3c46a6ec5b58100463 upstream. gc and write-back get raced (see the email "bcache get stucked" I sended before): gc thread write-back thread | |bch_writeback_thread() |bch_gc_thread() | | |==>read_dirty() |==>bch_btree_gc() | |==>btree_root() //get btree root | | //node write locker | |==>bch_btree_gc_root() | | |==>read_dirty_submit() | |==>write_dirty() | |==>continue_at(cl, | | write_dirty_finish, | | system_wq); | |==>write_dirty_finish()//excute | | //in system_wq | |==>bch_btree_insert() | |==>bch_btree_map_leaf_nodes() | |==>__bch_btree_map_nodes() | |==>btree_root //try to get btree | | //root node read | | //lock | |-----stuck here |==>bch_btree_set_root() |==>bch_journal_meta() |==>bch_journal() |==>journal_try_write() |==>journal_write_unlocked() //journal_full(&c->journal) | //condition satisfied |==>continue_at(cl, journal_write, system_wq); //try to excute | //journal_write in system_wq | //but work queue is excuting | //write_dirty_finish() |==>closure_sync(); //wait journal_write execute | //over and wake up gc, |-------------stuck here |==>release root node write locker This patch alloc a separate work-queue for write-back thread to avoid such race. (Commit log re-organized by Coly Li to pass checkpatch.pl checking) Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn> Acked-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27bcache: Correct return value for sysfs attach errorsTony Asleson
commit 77fa100f27475d08a569b9d51c17722130f089e7 upstream. If you encounter any errors in bch_cached_dev_attach it will return a negative error code. The variable 'v' which stores the result is unsigned, thus user space sees a very large value returned for bytes written which can cause incorrect user space behavior. Utilize 1 signed variable to use throughout the function to preserve error return capability. Signed-off-by: Tony Asleson <tasleson@redhat.com> Acked-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27bcache: correct cache_dirty_target in __update_writeback_rate()Tang Junhui
commit a8394090a9129b40f9d90dcb7f4a49d60c727ca6 upstream. __update_write_rate() uses a Proportion-Differentiation Controller algorithm to control writeback rate. A dirty target number is used in this PD controller to control writeback rate. A larger target number will make the writeback rate smaller, on the versus, a smaller target number will make the writeback rate larger. bcache uses the following steps to calculate the target number, 1) cache_sectors = all-buckets-of-cache-set * buckets-size 2) cache_dirty_target = cache_sectors * cached-device-writeback_percent 3) target = cache_dirty_target * (sectors-of-cached-device/sectors-of-all-cached-devices-of-this-cache-set) The calculation at step 1) for cache_sectors is incorrect, which does not consider dirty blocks occupied by flash only volume. A flash only volume can be took as a bcache device without cached device. All data sectors allocated for it are persistent on cache device and marked dirty, they are not touched by bcache writeback and garbage collection code. So data blocks of flash only volume should be ignore when calculating cache_sectors of cache set. Current code does not subtract dirty sectors of flash only volume, which results a larger target number from the above 3 steps. And in sequence the cache device's writeback rate is smaller then a correct value, writeback speed is slower on all cached devices. This patch fixes the incorrect slower writeback rate by subtracting dirty sectors of flash only volumes in __update_writeback_rate(). (Commit log composed by Coly Li to pass checkpatch.pl checking) Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn> Reviewed-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27bcache: do not subtract sectors_to_gc for bypassed IOTang Junhui
commit 69daf03adef5f7bc13e0ac86b4b8007df1767aab upstream. Since bypassed IOs use no bucket, so do not subtract sectors_to_gc to trigger gc thread. Signed-off-by: tang.junhui <tang.junhui@zte.com.cn> Acked-by: Coly Li <colyli@suse.de> Reviewed-by: Eric Wheeler <bcache@linux.ewheeler.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27bcache: Fix leak of bdev referenceJan Kara
commit 4b758df21ee7081ab41448d21d60367efaa625b3 upstream. If blkdev_get_by_path() in register_bcache() fails, we try to lookup the block device using lookup_bdev() to detect which situation we are in to properly report error. However we never drop the reference returned to us from lookup_bdev(). Fix that. Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27bcache: initialize dirty stripes in flash_dev_run()Tang Junhui
commit 175206cf9ab63161dec74d9cd7f9992e062491f5 upstream. bcache uses a Proportion-Differentiation Controller algorithm to control writeback rate to cached devices. In the PD controller algorithm, dirty stripes of thin flash device should not be counted in, because flash only volumes never write back dirty data. Currently dirty stripe counter for thin flash device is not initialized when the thin flash device starts. Which means the following calculation in PD controller will reference an undefined dirty stripes number, and all cached devices attached to the same cache set where the thin flash device lies on may have an inaccurate writeback rate. This patch calles bch_sectors_dirty_init() in flash_dev_run(), to correctly initialize dirty stripe counter when the thin flash device starts to run. This patch also does following parameter data type change, -void bch_sectors_dirty_init(struct cached_dev *dc); +void bch_sectors_dirty_init(struct bcache_device *); to call this function conveniently in flash_dev_run(). (Commit log is composed by Coly Li) Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn> Reviewed-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27PM / devfreq: Fix memory leak when fail to register deviceChanwoo Choi
commit 9e14de1077e9c34f141cf98bdba60cdd5193d962 upstream. When the devfreq_add_device fails to register deivce, the memory leak of devfreq instance happen. So, this patch fix the memory leak issue. Before freeing the devfreq instance checks whether devfreq instance is NULL or not because the device_unregister() frees the devfreq instance when jumping to the 'err_init'. It is to prevent the duplicate the kfee(devfreq). Fixes: ac4b281176a5 ("PM / devfreq: fix duplicated kfree on devfreq pointer") Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com> Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27media: uvcvideo: Prevent heap overflow when accessing mapped controlsGuenter Roeck
commit 7e09f7d5c790278ab98e5f2c22307ebe8ad6e8ba upstream. The size of uvc_control_mapping is user controlled leading to a potential heap overflow in the uvc driver. This adds a check to verify the user provided size fits within the bounds of the defined buffer size. Originally-from: Richard Simmons <rssimmo@amazon.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27media: v4l2-compat-ioctl32: Fix timespec conversionDaniel Mentz
commit 9c7ba1d7634cef490b85bc64c4091ff004821bfd upstream. Certain syscalls like recvmmsg support 64 bit timespec values for the X32 ABI. The helper function compat_put_timespec converts a timespec value to a 32 bit or 64 bit value depending on what ABI is used. The v4l2 compat layer, however, is not designed to support 64 bit timespec values and always uses 32 bit values. Hence, compat_put_timespec must not be used. Without this patch, user space will be provided with bad timestamp values from the VIDIOC_DQEVENT ioctl. Also, fields of the struct v4l2_event32 that come immediately after timestamp get overwritten, namely the field named id. Fixes: 81993e81a994 ("compat: Get rid of (get|put)_compat_time(val|spec)") Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Cc: Tiffany Lin <tiffany.lin@mediatek.com> Cc: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Daniel Mentz <danielmentz@google.com> Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27s390/mm: fix race on mm->context.flush_mmMartin Schwidefsky
commit 60f07c8ec5fae06c23e9fd7bab67dabce92b3414 upstream. The order in __tlb_flush_mm_lazy is to flush TLB first and then clear the mm->context.flush_mm bit. This can lead to missed flushes as the bit can be set anytime, the order needs to be the other way aronud. But this leads to a different race, __tlb_flush_mm_lazy may be called on two CPUs concurrently. If mm->context.flush_mm is cleared first then another CPU can bypass __tlb_flush_mm_lazy although the first CPU has not done the flush yet. In a virtualized environment the time until the flush is finally completed can be arbitrarily long. Add a spinlock to serialize __tlb_flush_mm_lazy and use the function in finish_arch_post_lock_switch as well. Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27s390/mm: fix local TLB flushing vs. detach of an mm address spaceMartin Schwidefsky
commit b3e5dc45fd1ec2aa1de6b80008f9295eb17e0659 upstream. The local TLB flushing code keeps an additional mask in the mm.context, the cpu_attach_mask. At the time a global flush of an address space is done the cpu_attach_mask is copied to the mm_cpumask in order to avoid future global flushes in case the mm is used by a single CPU only after the flush. Trouble is that the reset of the mm_cpumask is racy against the detach of an mm address space by switch_mm. The current order is first the global TLB flush and then the copy of the cpu_attach_mask to the mm_cpumask. The order needs to be the other way around. Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27net/netfilter/nf_conntrack_core: Fix net_conntrack_lock()Manfred Spraul
commit 3ef0c7a730de0bae03d86c19570af764fa3c4445 upstream. As we want to remove spin_unlock_wait() and replace it with explicit spin_lock()/spin_unlock() calls, we can use this to simplify the locking. In addition: - Reading nf_conntrack_locks_all needs ACQUIRE memory ordering. - The new code avoids the backwards loop. Only slightly tested, I did not manage to trigger calls to nf_conntrack_all_lock(). V2: With improved comments, to clearly show how the barriers pair. Fixes: b16c29191dc8 ("netfilter: nf_conntrack: use safer way to lock all buckets") Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: netfilter-devel@vger.kernel.org Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27PCI: pciehp: Report power fault only once until we clear itKeith Busch
commit 7612b3b28c0b900dcbcdf5e9b9747cc20a1e2455 upstream. When a power fault occurs, the power controller sets Power Fault Detected in the Slot Status register, and pciehp_isr() queues an INT_POWER_FAULT event to handle it. It also clears Power Fault Detected, but since nothing has yet changed to correct the power fault, the power controller will likely set it again immediately, which may cause an infinite loop when pcie_isr() rechecks Slot Status. Fix that by masking off Power Fault Detected from new events if the driver hasn't seen the power fault clear from the previous handling attempt. Fixes: fad214b0aa72 ("PCI: pciehp: Process all hotplug events before looking for new ones") Signed-off-by: Keith Busch <keith.busch@intel.com> [bhelgaas: changelog, pull test out and add comment] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Mayurkumar Patel <mayurkumar.patel@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27PCI: shpchp: Enable bridge bus mastering if MSI is enabledAleksandr Bezzubikov
commit 48b79a14505349a29b3e20f03619ada9b33c4b17 upstream. An SHPC may generate MSIs to notify software about slot or controller events (SHPC spec r1.0, sec 4.7). A PCI device can only generate an MSI if it has bus mastering enabled. Enable bus mastering if the bridge contains an SHPC that uses MSI for event notifications. Signed-off-by: Aleksandr Bezzubikov <zuban32s@gmail.com> [bhelgaas: changelog] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27ARC: Re-enable MMU upon Machine Check exceptionJose Abreu
commit 1ee55a8f7f6b7ca4c0c59e0b4b4e3584a085c2d3 upstream. I recently came upon a scenario where I would get a double fault machine check exception tiriggered by a kernel module. However the ensuing crash stacktrace (ksym lookup) was not working correctly. Turns out that machine check auto-disables MMU while modules are allocated in kernel vaddr spapce. This patch re-enables the MMU before start printing the stacktrace making stacktracing of modules work upon a fatal exception. Signed-off-by: Jose Abreu <joabreu@synopsys.com> Reviewed-by: Alexey Brodkin <abrodkin@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com> [vgupta: moved code into low level handler to avoid in 2 places] Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27tracing: Apply trace_clock changes to instance max bufferBaohong Liu
commit 170b3b1050e28d1ba0700e262f0899ffa4fccc52 upstream. Currently trace_clock timestamps are applied to both regular and max buffers only for global trace. For instance trace, trace_clock timestamps are applied only to regular buffer. But, regular and max buffers can be swapped, for example, following a snapshot. So, for instance trace, bad timestamps can be seen following a snapshot. Let's apply trace_clock timestamps to instance max buffer as well. Link: http://lkml.kernel.org/r/ebdb168d0be042dcdf51f81e696b17fabe3609c1.1504642143.git.tom.zanussi@linux.intel.com Fixes: 277ba0446 ("tracing: Add interface to allow multiple trace buffers") Signed-off-by: Baohong Liu <baohong.liu@intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27tracing: Add barrier to trace_printk() buffer nesting modificationSteven Rostedt (VMware)
commit 3d9622c12c8873911f4cc0ccdabd0362c2fca06b upstream. trace_printk() uses 4 buffers, one for each context (normal, softirq, irq and NMI), such that it does not need to worry about one context preempting the other. There's a nesting counter that gets incremented to figure out which buffer to use. If the context gets preempted by another context which calls trace_printk() it will increment the counter and use the next buffer, and restore the counter when it is finished. The problem is that gcc may optimize the modification of the buffer nesting counter and it may not be incremented in memory before the buffer is used. If this happens, and the context gets interrupted by another context, it could pick the same buffer and corrupt the one that is being used. Compiler barriers need to be added after the nesting variable is incremented and before it is decremented to prevent usage of the context buffers by more than one context at the same time. Cc: Andy Lutomirski <luto@kernel.org> Fixes: e2ace00117 ("tracing: Choose static tp_printk buffer by explicit nesting count") Hat-tip-to: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27ftrace: Fix memleak when unregistering dynamic ops when tracing disabledSteven Rostedt (VMware)
commit edb096e00724f02db5f6ec7900f3bbd465c6c76f upstream. If function tracing is disabled by the user via the function-trace option or the proc sysctl file, and a ftrace_ops that was allocated on the heap is unregistered, then the shutdown code exits out without doing the proper clean up. This was found via kmemleak and running the ftrace selftests, as one of the tests unregisters with function tracing disabled. # cat kmemleak unreferenced object 0xffffffffa0020000 (size 4096): comm "swapper/0", pid 1, jiffies 4294668889 (age 569.209s) hex dump (first 32 bytes): 55 ff 74 24 10 55 48 89 e5 ff 74 24 18 55 48 89 U.t$.UH...t$.UH. e5 48 81 ec a8 00 00 00 48 89 44 24 50 48 89 4c .H......H.D$PH.L backtrace: [<ffffffff81d64665>] kmemleak_vmalloc+0x85/0xf0 [<ffffffff81355631>] __vmalloc_node_range+0x281/0x3e0 [<ffffffff8109697f>] module_alloc+0x4f/0x90 [<ffffffff81091170>] arch_ftrace_update_trampoline+0x160/0x420 [<ffffffff81249947>] ftrace_startup+0xe7/0x300 [<ffffffff81249bd2>] register_ftrace_function+0x72/0x90 [<ffffffff81263786>] trace_selftest_ops+0x204/0x397 [<ffffffff82bb8971>] trace_selftest_startup_function+0x394/0x624 [<ffffffff81263a75>] run_tracer_selftest+0x15c/0x1d7 [<ffffffff82bb83f1>] init_trace_selftests+0x75/0x192 [<ffffffff81002230>] do_one_initcall+0x90/0x1e2 [<ffffffff82b7d620>] kernel_init_freeable+0x350/0x3fe [<ffffffff81d61ec3>] kernel_init+0x13/0x122 [<ffffffff81d72c6a>] ret_from_fork+0x2a/0x40 [<ffffffffffffffff>] 0xffffffffffffffff Fixes: 12cce594fa ("ftrace/x86: Allow !CONFIG_PREEMPT dynamic ops to use allocated trampolines") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27ftrace: Fix selftest goto location on errorSteven Rostedt (VMware)
commit 46320a6acc4fb58f04bcf78c4c942cc43b20f986 upstream. In the second iteration of trace_selftest_ops(), the error goto label is wrong in the case where trace_selftest_test_global_cnt is off. In the case of error, it leaks the dynamic ops that was allocated. Fixes: 95950c2e ("ftrace: Add self-tests for multiple function trace users") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27scsi: qla2xxx: Fix an integer overflow in sysfs codeDan Carpenter
commit e6f77540c067b48dee10f1e33678415bfcc89017 upstream. The value of "size" comes from the user. When we add "start + size" it could lead to an integer overflow bug. It means we vmalloc() a lot more memory than we had intended. I believe that on 64 bit systems vmalloc() can succeed even if we ask it to allocate huge 4GB buffers. So we would get memory corruption and likely a crash when we call ha->isp_ops->write_optrom() and ->read_optrom(). Only root can trigger this bug. Link: https://bugzilla.kernel.org/show_bug.cgi?id=194061 Fixes: b7cc176c9eb3 ("[SCSI] qla2xxx: Allow region-based flash-part accesses.") Reported-by: shqking <shqking@gmail.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27scsi: qla2xxx: Correction to vha->vref_count timeoutJoe Carnuccio
commit 6e98095f8fb6d98da34c4e6c34e69e7c638d79c0 upstream. Fix incorrect second argument for wait_event_timeout() Fixes: c4a9b538ab2a ("qla2xxx: Allow vref count to timeout on vport delete.") Signed-off-by: Joe Carnuccio <joe.carnuccio@cavium.com> Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27scsi: sg: fixup infoleak when using SG_GET_REQUEST_TABLEHannes Reinecke
commit 3e0097499839e0fe3af380410eababe5a47c4cf9 upstream. When calling SG_GET_REQUEST_TABLE ioctl only a half-filled table is returned; the remaining part will then contain stale kernel memory information. This patch zeroes out the entire table to avoid this issue. Signed-off-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>