summaryrefslogtreecommitdiff
path: root/arch/powerpc
AgeCommit message (Collapse)Author
2017-12-12Merge branch 'linux-4.9-nxp' into linux-4.9 on Dec. 12, 2017Xie Xiaobo
Signed-off-by: Xiaobo Xie <xiaobo.xie@nxp.com>
2017-12-12powerpc: dts: P1010: Add endianness property to flexcan nodePankaj Bansal
The flexcan driver assumed that flexcan controller is big endian for powerpc architecture and little endian for other architectures. But this is not universally true. flexcan controller can be little or big endian on any architecture. Therefore the flexcan driver has been modified to check for "big-endian" device tree property for controllers that are big endian. consequently add the property to freescale P1010 SOC device tree. Signed-off-by: Pankaj Bansal <pankaj.bansal@nxp.com> Reviewed-by: Poonam Aggrwal <poonam.aggrwal@nxp.com>
2017-12-12powerpc/fsl/dts: add FMan node for t1042d4rdbMadalin Bucur
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com> Signed-off-by: Scott Wood <oss@buserror.net>
2017-12-12Merge branch 'linux-4.9-nxp' into linux-4.9 on Dec. 8, 2017Xie Xiaobo
Signed-off-by: Xiaobo Xie <xiaobo.xie@nxp.com>
2017-12-12Merge Linaro linux 4.9.62 into linux-4.9Xie Xiaobo
Signed-off-by: Xiaobo Xie <xiaobo.xie@nxp.com>
2017-12-12irqchip/qeic: remove PPCisms for QEICZhao Qiang
[PowerPC part] QEIC was supported on PowerPC, and dependent on PPC, Now it is supported on other platforms, so remove PPCisms. Signed-off-by: Zhao Qiang <qiang.zhao@nxp.com>
2017-12-12irqchip/qeic: merge qeic init code from platforms to a common functionZhao Qiang
[PowerPC part] The codes of qe_ic init from a variety of platforms are redundant, merge them to a common function and put it to irqchip/irq-qeic.c For non-p1021_mds mpc85xx_mds boards, use "qe_ic_init(np, 0, qe_ic_cascade_low_mpic, qe_ic_cascade_high_mpic);" instead of "qe_ic_init(np, 0, qe_ic_cascade_muxed_mpic, NULL);". qe_ic_cascade_muxed_mpic was used for boards has the same interrupt number for low interrupt and high interrupt, qe_ic_init has checked if "low interrupt == high interrupt" Signed-off-by: Zhao Qiang <qiang.zhao@nxp.com>
2017-11-15sched/cputime, powerpc32: Fix stale scaled stime on context switchFrederic Weisbecker
[ Upstream commit 90d08ba2b9b4be4aeca6a5b5a4b09fbcde30194d ] On context switch with powerpc32, the cputime is accumulated in the thread_info struct. So the switching-in task must move forward its start time snapshot to the current time in order to later compute the delta spent in system mode. This is what we do for the normal cputime by initializing the starttime field to the value of the previous task's starttime which got freshly updated. But we are missing the update of the scaled cputime start time. As a result we may be accounting too much scaled cputime later. Fix this by initializing the scaled cputime the same way we do for normal cputime. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1483636310-6557-2-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-15powerpc/corenet: explicitly disable the SDHC controller on kmcoge4Valentin Longchamp
[ Upstream commit a674c7d470bb47e82f4eb1fa944eadeac2f6bbaf ] It is not implemented on the kmcoge4 hardware and if not disabled it leads to error messages with the corenet32_smp_defconfig. Signed-off-by: Valentin Longchamp <valentin.longchamp@keymile.com> Signed-off-by: Scott Wood <oss@buserror.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-15KVM: PPC: Book 3S: XICS: correct the real mode ICP rejecting counterLi Zhong
[ Upstream commit 37451bc95dee0e666927d6ffdda302dbbaaae6fa ] Some counters are added in Commit 6e0365b78273 ("KVM: PPC: Book3S HV: Add ICP real mode counters"), to provide some performance statistics to determine whether further optimizing is needed for real mode functions. The n_reject counter counts how many times ICP rejects an irq because of priority in real mode. The redelivery of an lsi that is still asserted after eoi doesn't fall into this category, so the increasement there is removed. Also, it needs to be increased in icp_rm_deliver_irq() if it rejects another one. Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-08powerpc/64: Don't try to use radix MMU under a hypervisorPaul Mackerras
[ Upstream commit 18569c1f134e1c5c88228f043c09678ae6052b7c ] Currently, if the kernel is running on a POWER9 processor under a hypervisor, it will try to use the radix MMU even though it doesn't have the necessary code to use radix under a hypervisor (it doesn't negotiate use of radix, and it doesn't do the H_REGISTER_PROC_TBL hcall). The result is that the guest kernel will crash when it tries to turn on the MMU. This fixes it by looking for the /chosen/ibm,architecture-vec-5 property, and if it exists, clears the radix MMU feature bit, before we decide whether to initialize for radix or HPT. This property is created by the hypervisor as a result of the guest calling the ibm,client-architecture-support method to indicate its capabilities, so it will indicate whether the hypervisor agreed to us using radix. Systems without a hypervisor may have this property also (for example, skiboot creates it), so we check the HV bit in the MSR to see whether we are running as a guest or not. If we are in hypervisor mode, then we can do whatever we like including using the radix MMU. The reason for using this property is that in future, when we have support for using radix under a hypervisor, we will need to check this property to see whether the hypervisor agreed to us using radix. Fixes: 2bfd65e45e87 ("powerpc/mm/radix: Add radix callbacks for early init routines") Cc: stable@vger.kernel.org # v4.7+ Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02KVM: PPC: Fix oops when checking KVM_CAP_PPC_HTMGreg Kurz
commit ac64115a66c18c01745bbd3c47a36b124e5fd8c0 upstream. The following program causes a kernel oops: #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <sys/ioctl.h> #include <linux/kvm.h> main() { int fd = open("/dev/kvm", O_RDWR); ioctl(fd, KVM_CHECK_EXTENSION, KVM_CAP_PPC_HTM); } This happens because when using the global KVM fd with KVM_CHECK_EXTENSION, kvm_vm_ioctl_check_extension() gets called with a NULL kvm argument, which gets dereferenced in is_kvmppc_hv_enabled(). Spotted while reading the code. Let's use the hv_enabled fallback variable, like everywhere else in this function. Fixes: 23528bb21ee2 ("KVM: PPC: Introduce KVM_CAP_PPC_HTM") Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-21powerpc/perf: Add restrictions to PMC5 in power9 DD1Madhavan Srinivasan
[ Upstream commit 8d911904f3ce412b20874a9c95f82009dcbb007c ] PMC5 on POWER9 DD1 may not provide right counts in all sampling scenarios, hence use PM_INST_DISP event instead in PMC2 or PMC3 in preference. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-12powerpc/tm: Fix illegal TM state in signal handlerGustavo Romero
commit 044215d145a7a8a60ffa8fdc859d110a795fa6ea upstream. Currently it's possible that on returning from the signal handler through the restore_tm_sigcontexts() code path (e.g. from a signal caught due to a `trap` instruction executed in the middle of an HTM block, or a deliberately constructed sigframe) an illegal TM state (like TS=10 TM=0, i.e. "T0") is set in SRR1 and when `rfid` sets implicitly the MSR register from SRR1 register on return to userspace it causes a TM Bad Thing exception. That illegal state can be set (a) by a malicious user that disables the TM bit by tweaking the bits in uc_mcontext before returning from the signal handler or (b) by a sufficient number of context switches occurring such that the load_tm counter overflows and TM is disabled whilst in the signal handler. This commit fixes the illegal TM state by ensuring that TM bit is always enabled before we return from restore_tm_sigcontexts(). A small comment correction is made as well. Fixes: 5d176f751ee3 ("powerpc: tm: Enable transactional memory (TM) lazily for userspace") Signed-off-by: Gustavo Romero <gromero@linux.vnet.ibm.com> Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-12powerpc/64s: Use emergency stack for kernel TM Bad Thing program checksCyril Bur
commit 265e60a170d0a0ecfc2d20490134ed2c48dd45ab upstream. When using transactional memory (TM), the CPU can be in one of six states as far as TM is concerned, encoded in the Machine State Register (MSR). Certain state transitions are illegal and if attempted trigger a "TM Bad Thing" type program check exception. If we ever hit one of these exceptions it's treated as a bug, ie. we oops, and kill the process and/or panic, depending on configuration. One case where we can trigger a TM Bad Thing, is when returning to userspace after a system call or interrupt, using RFID. When this happens the CPU first restores the user register state, in particular r1 (the stack pointer) and then attempts to update the MSR. However the MSR update is not allowed and so we take the program check with the user register state, but the kernel MSR. This tricks the exception entry code into thinking we have a bad kernel stack pointer, because the MSR says we're coming from the kernel, but r1 is pointing to userspace. To avoid this we instead always switch to the emergency stack if we take a TM Bad Thing from the kernel. That way none of the user register values are used, other than for printing in the oops message. This is the fix for CVE-2017-1000255. Fixes: 5d176f751ee3 ("powerpc: tm: Enable transactional memory (TM) lazily for userspace") Signed-off-by: Cyril Bur <cyrilbur@gmail.com> [mpe: Rewrite change log & comments, tweak asm slightly] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05powerpc/ftrace: Pass the correct stack pointer for DYNAMIC_FTRACE_WITH_REGSNaveen N. Rao
commit a4979a7e71eb8da976cbe4a0a1fa50636e76b04f upstream. For DYNAMIC_FTRACE_WITH_REGS, we should be passing-in the original set of registers in pt_regs, to capture the state _before_ ftrace_caller. However, we are instead passing the stack pointer *after* allocating a stack frame in ftrace_caller. Fix this by saving the proper value of r1 in pt_regs. Also, use SAVE_10GPRS() to simplify the code. Fixes: 153086644fd1 ("powerpc/ftrace: Add support for -mprofile-kernel ftrace ABI") Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05powerpc/tm: Flush TM only if CPU has TM featureGustavo Romero
commit c1fa0768a8713b135848f78fd43ffc208d8ded70 upstream. Commit cd63f3c ("powerpc/tm: Fix saving of TM SPRs in core dump") added code to access TM SPRs in flush_tmregs_to_thread(). However flush_tmregs_to_thread() does not check if TM feature is available on CPU before trying to access TM SPRs in order to copy live state to thread structures. flush_tmregs_to_thread() is indeed guarded by CONFIG_PPC_TRANSACTIONAL_MEM but it might be the case that kernel was compiled with CONFIG_PPC_TRANSACTIONAL_MEM enabled and ran on a CPU without TM feature available, thus rendering the execution of TM instructions that are treated by the CPU as illegal instructions. The fix is just to add proper checking in flush_tmregs_to_thread() if CPU has the TM feature before accessing any TM-specific resource, returning immediately if TM is no available on the CPU. Adding that checking in flush_tmregs_to_thread() instead of in places where it is called, like in vsr_get() and vsr_set(), is better because avoids the same problem cropping up elsewhere. Fixes: cd63f3c ("powerpc/tm: Fix saving of TM SPRs in core dump") Signed-off-by: Gustavo Romero <gromero@linux.vnet.ibm.com> Reviewed-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05powerpc/pseries: Fix parent_dn reference leak in add_dt_node()Tyrel Datwyler
commit b537ca6fede69a281dc524983e5e633d79a10a08 upstream. A reference to the parent device node is held by add_dt_node() for the node to be added. If the call to dlpar_configure_connector() fails add_dt_node() returns ENOENT and that reference is not freed. Add a call to of_node_put(parent_dn) prior to bailing out after a failed dlpar_configure_connector() call. Fixes: 8d5ff320766f ("powerpc/pseries: Make dlpar_configure_connector parent node aware") Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05KVM: PPC: Book3S HV: Protect updates to spapr_tce_tables listPaul Mackerras
commit edd03602d97236e8fea13cd76886c576186aa307 upstream. Al Viro pointed out that while one thread of a process is executing in kvm_vm_ioctl_create_spapr_tce(), another thread could guess the file descriptor returned by anon_inode_getfd() and close() it before the first thread has added it to the kvm->arch.spapr_tce_tables list. That highlights a more general problem: there is no mutual exclusion between writers to the spapr_tce_tables list, leading to the possibility of the list becoming corrupted, which could cause a host kernel crash. To fix the mutual exclusion problem, we add a mutex_lock/unlock pair around the list_del_rce in kvm_spapr_tce_release(). If another thread does guess the file descriptor returned by the anon_inode_getfd() call in kvm_vm_ioctl_create_spapr_tce() and closes it, its call to kvm_spapr_tce_release() will not do any harm because it will have to wait until the first thread has released kvm->lock. The other things that the second thread could do with the guessed file descriptor are to mmap it or to pass it as a parameter to a KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE ioctl on a KVM device fd. An mmap call won't cause any harm because kvm_spapr_tce_mmap() and kvm_spapr_tce_fault() don't access the spapr_tce_tables list or the kvmppc_spapr_tce_table.list field, and the fields that they do use have been properly initialized by the time of the anon_inode_getfd() call. The KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE ioctl calls kvm_spapr_tce_attach_iommu_group(), which scans the spapr_tce_tables list looking for the kvmppc_spapr_tce_table struct corresponding to the fd given as the parameter. Either it will find the new entry or it won't; if it doesn't, it just returns an error, and if it does, it will function normally. So, in each case there is no harmful effect. [paulus@ozlabs.org - moved parts of the upstream patch into the backport of 47c5310a8dbe, adjusted this commit message accordingly.] Fixes: 366baf28ee3f ("KVM: PPC: Use RCU for arch.spapr_tce_tables") Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05KVM: PPC: Book3S: Fix race and leak in kvm_vm_ioctl_create_spapr_tce()Paul Mackerras
commit 47c5310a8dbe7c2cb9f0083daa43ceed76c257fa upstream, with part of commit edd03602d97236e8fea13cd76886c576186aa307 folded in. Nixiaoming pointed out that there is a memory leak in kvm_vm_ioctl_create_spapr_tce() if the call to anon_inode_getfd() fails; the memory allocated for the kvmppc_spapr_tce_table struct is not freed, and nor are the pages allocated for the iommu tables. In addition, we have already incremented the process's count of locked memory pages, and this doesn't get restored on error. David Hildenbrand pointed out that there is a race in that the function checks early on that there is not already an entry in the stt->iommu_tables list with the same LIOBN, but an entry with the same LIOBN could get added between then and when the new entry is added to the list. This fixes all three problems. To simplify things, we now call anon_inode_getfd() before placing the new entry in the list. The check for an existing entry is done while holding the kvm->lock mutex, immediately before adding the new entry to the list. Finally, on failure we now call kvmppc_account_memlimit to decrement the process's count of locked memory pages. [paulus@ozlabs.org - folded in that part of edd03602d972 ("KVM: PPC: Book3S HV: Protect updates to spapr_tce_tables list", 2017-08-28) which restructured the code that 47c5310a8dbe modified, to avoid a build failure caused by the absence of put_unused_fd().] Fixes: 54738c097163 ("KVM: PPC: Accelerate H_PUT_TCE by implementing it in real mode") Fixes: f8626985c7c2 ("KVM: PPC: Account TCE-containing pages in locked_vm") Reported-by: Nixiaoming <nixiaoming@huawei.com> Reported-by: David Hildenbrand <david@redhat.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27powerpc: Fix DAR reporting when alignment handler faultsMichael Ellerman
commit f9effe925039cf54489b5c04e0d40073bb3a123d upstream. Anton noticed that if we fault part way through emulating an unaligned instruction, we don't update the DAR to reflect that. The DAR value is eventually reported back to userspace as the address in the SEGV signal, and if userspace is using that value to demand fault then it can be confused by us not setting the value correctly. This patch is ugly as hell, but is intended to be the minimal fix and back ports easily. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-30powerpc/mm: Ensure cpumask update is orderedBenjamin Herrenschmidt
commit 1a92a80ad386a1a6e3b36d576d52a1a456394b70 upstream. There is no guarantee that the various isync's involved with the context switch will order the update of the CPU mask with the first TLB entry for the new context being loaded by the HW. Be safe here and add a memory barrier to order any subsequent load/store which may bring entries into the TLB. The corresponding barrier on the other side already exists as pte updates use pte_xchg() which uses __cmpxchg_u64 which has a sync after the atomic operation. Cc: stable@vger.kernel.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Add comments in the code] [mpe: Backport to 4.12, minor context change] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-25powerpc: Fix VSX enabling/flushing to also test MSR_FP and MSR_VECBenjamin Herrenschmidt
commit 5a69aec945d27e78abac9fd032533d3aaebf7c1e upstream. VSX uses a combination of the old vector registers, the old FP registers and new "second halves" of the FP registers. Thus when we need to see the VSX state in the thread struct (flush_vsx_to_thread()) or when we'll use the VSX in the kernel (enable_kernel_vsx()) we need to ensure they are all flushed into the thread struct if either of them is individually enabled. Unfortunately we only tested if the whole VSX was enabled, not if they were individually enabled. Fixes: 72cd7b44bc99 ("powerpc: Uncomment and make enable_kernel_vsx() routine available") Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-11powerpc/64: Fix __check_irq_replay missing decrementer interruptNicholas Piggin
commit 3db40c312c2c1eb2187c5731102fa8ff380e6e40 upstream. If the decrementer wraps again and de-asserts the decrementer exception while hard-disabled, __check_irq_replay() has a test to notice the wrap when interrupts are re-enabled. The decrementer check must be done when clearing the PACA_IRQ_HARD_DIS flag, not when the PACA_IRQ_DEC flag is tested. Previously this worked because the decrementer interrupt was always the first one checked after clearing the hard disable flag, but HMI check was moved ahead of that, which introduced this bug. This can cause a missed decrementer interrupt if we soft-disable interrupts then take an HMI which is recorded in irq_happened, then hard-disable interrupts for > 4s to wrap the decrementer. Fixes: e0e0d6b7390b ("powerpc/64: Replay hypervisor maintenance interrupt first") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-11powerpc/tm: Fix saving of TM SPRs in core dumpGustavo Romero
commit cd63f3cf1d59b7ad8419eba1cac8f9126e79cc43 upstream. Currently flush_tmregs_to_thread() does not save the TM SPRs (TFHAR, TFIAR, TEXASR) to the thread struct, unless the process is currently inside a suspended transaction. If the process is core dumping, and the TM SPRs have changed since the last time the process was context switched, then we will save stale values of the TM SPRs to the core dump. Fix it by saving the live register state to the thread struct in that case. Fixes: 08e1c01d6aed ("powerpc/ptrace: Enable support for TM SPR state") Signed-off-by: Gustavo Romero <gromero@linux.vnet.ibm.com> Reviewed-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-07Revert "powerpc/numa: Fix percpu allocations to be NUMA aware"Greg Kroah-Hartman
This reverts commit b4624ff952ec7d268a9651cd9184a1995befc271 which is commit ba4a648f12f4cd0a8003dd229b6ca8a53348ee4b upstream. Michal Hocko writes: JFYI. We have encountered a regression after applying this patch on a large ppc machine. While the patch is the right thing to do it doesn't work well with the current vmalloc area size on ppc and large machines where NUMA nodes are very far from each other. Just for the reference the boot fails on such a machine with bunch of warning preceeding it. See http://lkml.kernel.org/r/20170724134240.GL25221@dhcp22.suse.cz It seems the right thing to do is to enlarge the vmalloc space on ppc but this is not the case in the upstream kernel yet AFAIK. It is also questionable whether that is a stable material but I will decision on you here. We have reverted this patch from our 4.4 based kernel. Newer kernels do not have enlarged vmalloc space yet AFAIK so they won't work properly eiter. This bug is quite rare though because you need a specific HW configuration to trigger the issue - namely NUMA nodes have to be far away from each other in the physical memory space. Cc: Michal Hocko <mhocko@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-07KVM: PPC: Book3S HV: Save/restore host values of debug registersPaul Mackerras
commit 7ceaa6dcd8c6f59588428cec37f3c8093dd1011f upstream. At present, HV KVM on POWER8 and POWER9 machines loses any instruction or data breakpoint set in the host whenever a guest is run. Instruction breakpoints are currently only used by xmon, but ptrace and the perf_event subsystem can set data breakpoints as well as xmon. To fix this, we save the host values of the debug registers (CIABR, DAWR and DAWRX) before entering the guest and restore them on exit. To provide space to save them in the stack frame, we expand the stack frame allocated by kvmppc_hv_entry() from 112 to 144 bytes. [paulus@ozlabs.org - Adjusted stack offsets since we aren't saving POWER9-specific registers.] Fixes: b005255e12a3 ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08) Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-07KVM: PPC: Book3S HV: Restore critical SPRs to host values on guest exitPaul Mackerras
commit 4c3bb4ccd074e1a0552078c0bf94c662367a1658 upstream. This restores several special-purpose registers (SPRs) to sane values on guest exit that were missed before. TAR and VRSAVE are readable and writable by userspace, and we need to save and restore them to prevent the guest from potentially affecting userspace execution (not that TAR or VRSAVE are used by any known program that run uses the KVM_RUN ioctl). We save/restore these in kvmppc_vcpu_run_hv() rather than on every guest entry/exit. FSCR affects userspace execution in that it can prohibit access to certain facilities by userspace. We restore it to the normal value for the task on exit from the KVM_RUN ioctl. IAMR is normally 0, and is restored to 0 on guest exit. However, with a radix host on POWER9, it is set to a value that prevents the kernel from executing user-accessible memory. On POWER9, we save IAMR on guest entry and restore it on guest exit to the saved value rather than 0. On POWER8 we continue to set it to 0 on guest exit. PSPB is normally 0. We restore it to 0 on guest exit to prevent userspace taking advantage of the guest having set it non-zero (which would allow userspace to set its SMT priority to high). UAMOR is normally 0. We restore it to 0 on guest exit to prevent the AMR from being used as a covert channel between userspace processes, since the AMR is not context-switched at present. [paulus@ozlabs.org - removed IAMR bits that are only needed on POWER9] Fixes: b005255e12a3 ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08) Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-07KVM: PPC: Book3S HV: Enable TM before accessing TM registersPaul Mackerras
commit e47057151422a67ce08747176fa21cb3b526a2c9 upstream. Commit 46a704f8409f ("KVM: PPC: Book3S HV: Preserve userspace HTM state properly", 2017-06-15) added code to read transactional memory (TM) registers but forgot to enable TM before doing so. The result is that if userspace does have live values in the TM registers, a KVM_RUN ioctl will cause a host kernel crash like this: [ 181.328511] Unrecoverable TM Unavailable Exception f60 at d00000001e7d9980 [ 181.328605] Oops: Unrecoverable TM Unavailable Exception, sig: 6 [#1] [ 181.328613] SMP NR_CPUS=2048 [ 181.328613] NUMA [ 181.328618] PowerNV [ 181.328646] Modules linked in: vhost_net vhost tap nfs_layout_nfsv41_files rpcsec_gss_krb5 nfsv4 dns_resolver nfs +fscache xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat +nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun ebtable_filter ebtables +ip6table_filter ip6_tables iptable_filter bridge stp llc kvm_hv kvm nfsd ses enclosure scsi_transport_sas ghash_generic +auth_rpcgss gf128mul xts sg ctr nfs_acl lockd vmx_crypto shpchp ipmi_powernv i2c_opal grace ipmi_devintf i2c_core +powernv_rng sunrpc ipmi_msghandler ibmpowernv uio_pdrv_genirq uio leds_powernv powernv_op_panel ip_tables xfs sd_mod +lpfc ipr bnx2x libata mdio ptp pps_core scsi_transport_fc libcrc32c dm_mirror dm_region_hash dm_log dm_mod [ 181.329278] CPU: 40 PID: 9926 Comm: CPU 0/KVM Not tainted 4.12.0+ #1 [ 181.329337] task: c000003fc6980000 task.stack: c000003fe4d80000 [ 181.329396] NIP: d00000001e7d9980 LR: d00000001e77381c CTR: d00000001e7d98f0 [ 181.329465] REGS: c000003fe4d837e0 TRAP: 0f60 Not tainted (4.12.0+) [ 181.329523] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> [ 181.329527] CR: 24022448 XER: 00000000 [ 181.329608] CFAR: d00000001e773818 SOFTE: 1 [ 181.329608] GPR00: d00000001e77381c c000003fe4d83a60 d00000001e7ef410 c000003fdcfe0000 [ 181.329608] GPR04: c000003fe4f00000 0000000000000000 0000000000000000 c000003fd7954800 [ 181.329608] GPR08: 0000000000000001 c000003fc6980000 0000000000000000 d00000001e7e2880 [ 181.329608] GPR12: d00000001e7d98f0 c000000007b19000 00000001295220e0 00007fffc0ce2090 [ 181.329608] GPR16: 0000010011886608 00007fff8c89f260 0000000000000001 00007fff8c080028 [ 181.329608] GPR20: 0000000000000000 00000100118500a6 0000010011850000 0000010011850000 [ 181.329608] GPR24: 00007fffc0ce1b48 0000010011850000 00000000d673b901 0000000000000000 [ 181.329608] GPR28: 0000000000000000 c000003fdcfe0000 c000003fdcfe0000 c000003fe4f00000 [ 181.330199] NIP [d00000001e7d9980] kvmppc_vcpu_run_hv+0x90/0x6b0 [kvm_hv] [ 181.330264] LR [d00000001e77381c] kvmppc_vcpu_run+0x2c/0x40 [kvm] [ 181.330322] Call Trace: [ 181.330351] [c000003fe4d83a60] [d00000001e773478] kvmppc_set_one_reg+0x48/0x340 [kvm] (unreliable) [ 181.330437] [c000003fe4d83b30] [d00000001e77381c] kvmppc_vcpu_run+0x2c/0x40 [kvm] [ 181.330513] [c000003fe4d83b50] [d00000001e7700b4] kvm_arch_vcpu_ioctl_run+0x114/0x2a0 [kvm] [ 181.330586] [c000003fe4d83bd0] [d00000001e7642f8] kvm_vcpu_ioctl+0x598/0x7a0 [kvm] [ 181.330658] [c000003fe4d83d40] [c0000000003451b8] do_vfs_ioctl+0xc8/0x8b0 [ 181.330717] [c000003fe4d83de0] [c000000000345a64] SyS_ioctl+0xc4/0x120 [ 181.330776] [c000003fe4d83e30] [c00000000000b004] system_call+0x58/0x6c [ 181.330833] Instruction dump: [ 181.330869] e92d0260 e9290b50 e9290108 792807e3 41820058 e92d0260 e9290b50 e9290108 [ 181.330941] 792ae8a4 794a1f87 408204f4 e92d0260 <7d4022a6> f9490ff0 e92d0260 7d4122a6 [ 181.331013] ---[ end trace 6f6ddeb4bfe92a92 ]--- The fix is just to turn on the TM bit in the MSR before accessing the registers. Fixes: 46a704f8409f ("KVM: PPC: Book3S HV: Preserve userspace HTM state properly") Reported-by: Jan Stancek <jstancek@redhat.com> Tested-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-07powerpc/pseries: Fix of_node_put() underflow during reconfig removeLaurent Vivier
commit 4fd1bd443e80b12f0a01a45fb9a793206b41cb72 upstream. As for commit 68baf692c435 ("powerpc/pseries: Fix of_node_put() underflow during DLPAR remove"), the call to of_node_put() must be removed from pSeries_reconfig_remove_node(). dlpar_detach_node() and pSeries_reconfig_remove_node() both call of_detach_node(), and thus the node should not be released in both cases. Fixes: 0829f6d1f69e ("of: device_node kobject lifecycle fixes") Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27powerpc/mm/radix: Properly clear process table entryBenjamin Herrenschmidt
commit c6bb0b8d426a8cf865ca9c8a532cc3a2927cfceb upstream. On radix, the process table entry we want to clear when destroying a context is entry 0, not entry 1. This has no *immediate* consequence on Power9, but it can cause other bugs to become worse. Fixes: 7e381c0ff618 ("powerpc/mm/radix: Add mmu context handling callback for radix") Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27powerpc/asm: Mark cr0 as clobbered in mftb()Oliver O'Halloran
commit 2400fd822f467cb4c886c879d8ad99feac9cf319 upstream. The workaround for the CELL timebase bug does not correctly mark cr0 as being clobbered. This means GCC doesn't know that the asm block changes cr0 and might leave the result of an unrelated comparison in cr0 across the block, which we then trash, leading to basically random behaviour. Fixes: 859deea949c3 ("[POWERPC] Cell timebase bug workaround") Signed-off-by: Oliver O'Halloran <oohall@gmail.com> [mpe: Tweak change log and flag for stable] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27powerpc: Fix emulation of mfocrf in emulate_step()Anton Blanchard
commit 64e756c55aa46fc18fd53e8f3598b73b528d8637 upstream. From POWER4 onwards, mfocrf() only places the specified CR field into the destination GPR, and the rest of it is set to 0. The PowerPC AS from version 3.0 now requires this behaviour. The emulation code currently puts the entire CR into the destination GPR. Fix it. Fixes: 6888199f7fe5 ("[POWERPC] Emulate more instructions in software") Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27powerpc: Fix emulation of mcrf in emulate_step()Anton Blanchard
commit 87c4b83e0fe234a1f0eed131ab6fa232036860d5 upstream. The mcrf emulation code was using the CR field number directly as the shift value, without taking into account that CR fields are numbered from 0-7 starting at the high bits. That meant it was looking at the CR fields in the reverse order. Fixes: cf87c3f6b647 ("powerpc: Emulate icbi, mcrf and conditional-trap instructions") Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27powerpc/64: Fix atomic64_inc_not_zero() to return an intMichael Ellerman
commit 01e6a61aceb82e13bec29502a8eb70d9574f97ad upstream. Although it's not documented anywhere, there is an expectation that atomic64_inc_not_zero() returns a result which fits in an int. This is the behaviour implemented on all arches except powerpc. This has caused at least one bug in practice, in the percpu-refcount code, where the long result from our atomic64_inc_not_zero() was truncated to an int leading to lost references and stuck systems. That was worked around in that code in commit 966d2b04e070 ("percpu-refcount: fix reference leak during percpu-atomic transition"). To the best of my grepping abilities there are no other callers in-tree which truncate the value, but we should fix it anyway. Because the breakage is subtle and potentially very harmful I'm also tagging it for stable. Code generation is largely unaffected because in most cases the callers are just using the result for a test anyway. In particular the case of fget() that was mentioned in commit a6cf7ed5119f ("powerpc/atomic: Implement atomic*_inc_not_zero") generates exactly the same code. Fixes: a6cf7ed5119f ("powerpc/atomic: Implement atomic*_inc_not_zero") Noticed-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27powerpc/pseries: Fix passing of pp0 in updatepp() and updateboltedpp()Balbir Singh
commit e71ff982ae4c17d176e9f0132157d54973788377 upstream. Once upon a time there were only two PP (page protection) bits. In ISA 2.03 an additional PP bit was added, but because of the layout of the HPTE it could not be made contiguous with the existing PP bits. The result is that we now have three PP bits, named pp0, pp1, pp2, where pp0 occupies bit 63 of dword 1 of the HPTE and pp1 and pp2 occupy bits 1 and 0 respectively. Until recently Linux hasn't used pp0, however with the addition of _PAGE_KERNEL_RO we started using it. The problem arises in the LPAR code, where we need to translate the PP bits into the argument for the H_PROTECT hypercall. Currently the code only passes bits 0-2 of newpp, which covers pp1, pp2 and N (no execute), meaning pp0 is not passed to the hypervisor at all. We can't simply pass it through in bit 63, as that would collide with a different field in the flags argument, as defined in PAPR. Instead we have to shift it down to bit 8 (IBM bit 55). Fixes: e58e87adc8bf ("powerpc/mm: Update _PAGE_KERNEL_RO") Signed-off-by: Balbir Singh <bsingharora@gmail.com> [mpe: Simplify the test, rework change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-21powerpc: move ELF_ET_DYN_BASE to 4GB / 4MBKees Cook
commit 47ebb09d54856500c5a5e14824781902b3bb738e upstream. Now that explicitly executed loaders are loaded in the mmap region, we have more freedom to decide where we position PIE binaries in the address space to avoid possible collisions with mmap or stack regions. For 64-bit, align to 4GB to allow runtimes to use the entire 32-bit address space for 32-bit pointers. On 32-bit use 4MB, which is the traditional x86 minimum load location, likely to avoid historically requiring a 4MB page table entry when only a portion of the first 4MB would be used (since the NULL address is avoided). Link: http://lkml.kernel.org/r/1498154792-49952-4-git-send-email-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Tested-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Pratyush Anand <panand@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-14dts: SDK QBMan nodesMadalin Bucur
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
2017-07-14dts: SDK FMan nodesMadalin Bucur
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
2017-07-05powerpc/eeh: Enable IO path on permanent errorGavin Shan
[ Upstream commit 387bbc974f6adf91aa635090f73434ed10edd915 ] We give up recovery on permanent error, simply shutdown the affected devices and remove them. If the devices can't be put into quiet state, they spew more traffic that is likely to cause another unexpected EEH error. This was observed on "p8dtu2u" machine: 0002:00:00.0 PCI bridge: IBM Device 03dc 0002:01:00.0 Ethernet controller: Intel Corporation \ Ethernet Controller X710/X557-AT 10GBASE-T (rev 02) 0002:01:00.1 Ethernet controller: Intel Corporation \ Ethernet Controller X710/X557-AT 10GBASE-T (rev 02) 0002:01:00.2 Ethernet controller: Intel Corporation \ Ethernet Controller X710/X557-AT 10GBASE-T (rev 02) 0002:01:00.3 Ethernet controller: Intel Corporation \ Ethernet Controller X710/X557-AT 10GBASE-T (rev 02) On P8 PowerNV platform, the IO path is frozen when shutdowning the devices, meaning the memory registers are inaccessible. It is why the devices can't be put into quiet state before removing them. This fixes the issue by enabling IO path prior to putting the devices into quiet state. Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Acked-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29powerpc/64s: Handle data breakpoints in Radix modeNaveen N. Rao
commit d89ba5353f301971dd7d2f9fdf25c4432728f38e upstream. On Power9, trying to use data breakpoints throws the splat shown below. This is because the check for a data breakpoint in DSISR is in do_hash_page(), which is not called when in Radix mode. Unable to handle kernel paging request for data at address 0xc000000000e19218 Faulting instruction address: 0xc0000000001155e8 cpu 0x0: Vector: 300 (Data Access) at [c0000000ef1e7b20] pc: c0000000001155e8: find_pid_ns+0x48/0xe0 lr: c000000000116ac4: find_task_by_vpid+0x44/0x90 sp: c0000000ef1e7da0 msr: 9000000000009033 dar: c000000000e19218 dsisr: 400000 Move the check to handle_page_fault() so as to catch data breakpoints in both Hash and Radix MMU modes. We have to change the check in do_hash_page() against 0xa410 to use 0xa450, so as to include the value of (DSISR_DABRMATCH << 16). There are two sites that call handle_page_fault() when in Radix, both already pass DSISR in r4. Fixes: caca285e5ab4 ("powerpc/mm/radix: Use STD_MMU_64 to properly isolate hash related code") Reported-by: Shriya R. Kulkarni <shriykul@in.ibm.com> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> [mpe: Fix the fall-through case on hash, we need to reload DSISR] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29powerpc/kprobes: Pause function_graph tracing during jprobes handlingNaveen N. Rao
commit a9f8553e935f26cb5447f67e280946b0923cd2dc upstream. This fixes a crash when function_graph and jprobes are used together. This is essentially commit 237d28db036e ("ftrace/jprobes/x86: Fix conflict between jprobes and function graph tracing"), but for powerpc. Jprobes breaks function_graph tracing since the jprobe hook needs to use jprobe_return(), which never returns back to the hook, but instead to the original jprobe'd function. The solution is to momentarily pause function_graph tracing before invoking the jprobe hook and re-enable it when returning back to the original jprobe'd function. Fixes: 6794c78243bf ("powerpc64: port of the function graph tracer") Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29KVM: PPC: Book3S HV: Context-switch EBB registers properlyPaul Mackerras
commit ca8efa1df1d15a1795a2da57f9f6aada6ed6b946 upstream. This adds code to save the values of three SPRs (special-purpose registers) used by userspace to control event-based branches (EBBs), which are essentially interrupts that get delivered directly to userspace. These registers are loaded up with guest values when entering the guest, and their values are saved when exiting the guest, but we were not saving the host values and restoring them before going back to userspace. On POWER8 this would only affect userspace programs which explicitly request the use of EBBs and also use the KVM_RUN ioctl, since the only source of EBBs on POWER8 is the PMU, and there is an explicit enable bit in the PMU registers (and those PMU registers do get properly context-switched between host and guest). On POWER9 there is provision for externally-generated EBBs, and these are not subject to the control in the PMU registers. Since these registers only affect userspace, we can save them when we first come in from userspace and restore them before returning to userspace, rather than saving/restoring the host values on every guest entry/exit. Similarly, we don't need to worry about their values on offline secondary threads since they execute in the context of the idle task, which never executes in userspace. Fixes: b005255e12a3 ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08) Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29KVM: PPC: Book3S HV: Preserve userspace HTM state properlyPaul Mackerras
commit 46a704f8409f79fd66567ad3f8a7304830a84293 upstream. If userspace attempts to call the KVM_RUN ioctl when it has hardware transactional memory (HTM) enabled, the values that it has put in the HTM-related SPRs TFHAR, TFIAR and TEXASR will get overwritten by guest values. To fix this, we detect this condition and save those SPR values in the thread struct, and disable HTM for the task. If userspace goes to access those SPRs or the HTM facility in future, a TM-unavailable interrupt will occur and the handler will reload those SPRs and re-enable HTM. If userspace has started a transaction and suspended it, we would currently lose the transactional state in the guest entry path and would almost certainly get a "TM Bad Thing" interrupt, which would cause the host to crash. To avoid this, we detect this case and return from the KVM_RUN ioctl with an EINVAL error, with the KVM exit reason set to KVM_EXIT_FAIL_ENTRY. Fixes: b005255e12a3 ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08) Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29powerpc/perf: Fix oops when kthread execs user processRavi Bangoria
commit bf05fc25f268cd62f147f368fe65ad3e5b04fe9f upstream. When a kthread calls call_usermodehelper() the steps are: 1. allocate current->mm 2. load_elf_binary() 3. populate current->thread.regs While doing this, interrupts are not disabled. If there is a perf interrupt in the middle of this process (i.e. step 1 has completed but not yet reached to step 3) and if perf tries to read userspace regs, kernel oops with following log: Unable to handle kernel paging request for data at address 0x00000000 Faulting instruction address: 0xc0000000000da0fc ... Call Trace: perf_output_sample_regs+0x6c/0xd0 perf_output_sample+0x4e4/0x830 perf_event_output_forward+0x64/0x90 __perf_event_overflow+0x8c/0x1e0 record_and_restart+0x220/0x5c0 perf_event_interrupt+0x2d8/0x4d0 performance_monitor_exception+0x54/0x70 performance_monitor_common+0x158/0x160 --- interrupt: f01 at avtab_search_node+0x150/0x1a0 LR = avtab_search_node+0x100/0x1a0 ... load_elf_binary+0x6e8/0x15a0 search_binary_handler+0xe8/0x290 do_execveat_common.isra.14+0x5f4/0x840 call_usermodehelper_exec_async+0x170/0x210 ret_from_kernel_thread+0x5c/0x7c Fix it by setting abi to PERF_SAMPLE_REGS_ABI_NONE when userspace pt_regs are not set. Fixes: ed4a4ef85cf5 ("powerpc/perf: Add support for sampling interrupt register state") Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-24mm: larger stack guard gap, between vmasHugh Dickins
commit 1be7107fbe18eed3e319a6c3e83c78254b693acb upstream. Stack guard page is a useful feature to reduce a risk of stack smashing into a different mapping. We have been using a single page gap which is sufficient to prevent having stack adjacent to a different mapping. But this seems to be insufficient in the light of the stack usage in userspace. E.g. glibc uses as large as 64kB alloca() in many commonly used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX] which is 256kB or stack strings with MAX_ARG_STRLEN. This will become especially dangerous for suid binaries and the default no limit for the stack size limit because those applications can be tricked to consume a large portion of the stack and a single glibc call could jump over the guard page. These attacks are not theoretical, unfortunatelly. Make those attacks less probable by increasing the stack guard gap to 1MB (on systems with 4k pages; but make it depend on the page size because systems with larger base pages might cap stack allocations in the PAGE_SIZE units) which should cover larger alloca() and VLA stack allocations. It is obviously not a full fix because the problem is somehow inherent, but it should reduce attack space a lot. One could argue that the gap size should be configurable from userspace, but that can be done later when somebody finds that the new 1MB is wrong for some special case applications. For now, add a kernel command line option (stack_guard_gap) to specify the stack gap size (in page units). Implementation wise, first delete all the old code for stack guard page: because although we could get away with accounting one extra page in a stack vma, accounting a larger gap can break userspace - case in point, a program run with "ulimit -S -v 20000" failed when the 1MB gap was counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK and strict non-overcommit mode. Instead of keeping gap inside the stack vma, maintain the stack guard gap as a gap between vmas: using vm_start_gap() in place of vm_start (or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few places which need to respect the gap - mainly arch_get_unmapped_area(), and and the vma tree's subtree_gap support for that. Original-patch-by: Oleg Nesterov <oleg@redhat.com> Original-patch-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Tested-by: Helge Deller <deller@gmx.de> # parisc Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [wt: backport to 4.11: adjust context] [wt: backport to 4.9: adjust context ; kernel doc was not in admin-guide] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-17powerpc/powernv: Properly set "host-ipi" on IPIsBenjamin Herrenschmidt
[ Upstream commit f83e6862047e1e371bdc5d512dd6cabe8a3965b8 ] Otherwise KVM will fail to pass them through to the host Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14powerpc/kernel: Initialize load_tm on task creationBreno Leitao
commit 7f22ced4377628074e2ac25f41a88f98eb3b03f1 upstream. Currently tsk->thread.load_tm is not initialized in the task creation and can contain garbage on a new task. This is an undesired behaviour, since it affects the timing to enable and disable the transactional memory laziness (disabling and enabling the MSR TM bit, which affects TM reclaim and recheckpoint in the scheduling process). Fixes: 5d176f751ee3 ("powerpc: tm: Enable transactional memory (TM) lazily for userspace") Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14powerpc/kernel: Fix FP and vector register restorationBreno Leitao
commit 1195892c091a15cc862f4e202482a36adc924e12 upstream. Currently tsk->thread->load_vec and load_fp are not initialized during task creation, which can lead to garbage values in these variables (non-zero values). These variables will be checked later in restore_math() to validate if the FP and vector registers are being utilized. Since these values might be non-zero, the restore_math() will continue to save the FP and vectors even if they were never utilized by the userspace application. load_fp and load_vec counters will then overflow (they wrap at 255) and the FP and Altivec will be finally disabled, but before that condition is reached (counter overflow) several context switches will have restored FP and vector registers without need, causing a performance degradation. Fixes: 70fe3d980f5f ("powerpc: Restore FPU/VEC/VSX if previously used") Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Gustavo Romero <gusbromero@gmail.com> Acked-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14powerpc/hotplug-mem: Fix missing endian conversion of aa_indexMichael Bringmann
commit dc421b200f91930c9c6a9586810ff8c232cf10fc upstream. When adding or removing memory, the aa_index (affinity value) for the memblock must also be converted to match the endianness of the rest of the 'ibm,dynamic-memory' property. Otherwise, subsequent retrieval of the attribute will likely lead to non-existent nodes, followed by using the default node in the code inappropriately. Fixes: 5f97b2a0d176 ("powerpc/pseries: Implement memory hotplug add in the kernel") Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>