summaryrefslogtreecommitdiff
path: root/arch/powerpc/kvm
AgeCommit message (Collapse)Author
2016-05-20Merge tag 'powerpc-4.7-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Highlights: - Support for Power ISA 3.0 (Power9) Radix Tree MMU from Aneesh Kumar K.V - Live patching support for ppc64le (also merged via livepatching.git) Various cleanups & minor fixes from: - Aaro Koskinen, Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V, Chris Smart, Daniel Axtens, Frederic Barrat, Gavin Shan, Ian Munsie, Lennart Sorensen, Madhavan Srinivasan, Mahesh Salgaonkar, Markus Elfring, Michael Ellerman, Oliver O'Halloran, Paul Gortmaker, Paul Mackerras, Rashmica Gupta, Russell Currey, Suraj Jitindar Singh, Thiago Jung Bauermann, Valentin Rothberg, Vipin K Parashar. General: - Update LMB associativity index during DLPAR add/remove from Nathan Fontenot - Fix branching to OOL handlers in relocatable kernel from Hari Bathini - Add support for userspace Power9 copy/paste from Chris Smart - Always use STRICT_MM_TYPECHECKS from Michael Ellerman - Add mask of possible MMU features from Michael Ellerman PCI: - Enable pass through of NVLink to guests from Alexey Kardashevskiy - Cleanups in preparation for powernv PCI hotplug from Gavin Shan - Don't report error in eeh_pe_reset_and_recover() from Gavin Shan - Restore initial state in eeh_pe_reset_and_recover() from Gavin Shan - Revert "powerpc/eeh: Fix crash in eeh_add_device_early() on Cell" from Guilherme G Piccoli - Remove the dependency on EEH struct in DDW mechanism from Guilherme G Piccoli selftests: - Test cp_abort during context switch from Chris Smart - Add several tests for transactional memory support from Rashmica Gupta perf: - Add support for sampling interrupt register state from Anju T - Add support for unwinding perf-stackdump from Chandan Kumar cxl: - Configure the PSL for two CAPI ports on POWER8NVL from Philippe Bergheaud - Allow initialization on timebase sync failures from Frederic Barrat - Increase timeout for detection of AFU mmio hang from Frederic Barrat - Handle num_of_processes larger than can fit in the SPA from Ian Munsie - Ensure PSL interrupt is configured for contexts with no AFU IRQs from Ian Munsie - Add kernel API to allow a context to operate with relocate disabled from Ian Munsie - Check periodically the coherent platform function's state from Christophe Lombard Freescale: - Updates from Scott: "Contains 86xx fixes, minor device tree fixes, an erratum workaround, and a kconfig dependency fix." * tag 'powerpc-4.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (192 commits) powerpc/86xx: Fix PCI interrupt map definition powerpc/86xx: Move pci1 definition to the include file powerpc/fsl: Fix build of the dtb embedded kernel images powerpc/fsl: Fix rcpm compatible string powerpc/fsl: Remove FSL_SOC dependency from FSL_LBC powerpc/fsl-pci: Add a workaround for PCI 5 errata powerpc/fsl: Fix SPI compatible on t208xrdb and t1040rdb powerpc/powernv/npu: Add PE to PHB's list powerpc/powernv: Fix insufficient memory allocation powerpc/iommu: Remove the dependency on EEH struct in DDW mechanism Revert "powerpc/eeh: Fix crash in eeh_add_device_early() on Cell" powerpc/eeh: Drop unnecessary label in eeh_pe_change_owner() powerpc/eeh: Ignore handlers in eeh_pe_reset_and_recover() powerpc/eeh: Restore initial state in eeh_pe_reset_and_recover() powerpc/eeh: Don't report error in eeh_pe_reset_and_recover() Revert "powerpc/powernv: Exclude root bus in pnv_pci_reset_secondary_bus()" powerpc/powernv/npu: Enable NVLink pass through powerpc/powernv/npu: Rework TCE Kill handling powerpc/powernv/npu: Add set/unset window helpers powerpc/powernv/ioda2: Export debug helper pe_level_printk() ...
2016-05-13KVM: halt_polling: provide a way to qualify wakeups during pollChristian Borntraeger
Some wakeups should not be considered a sucessful poll. For example on s390 I/O interrupts are usually floating, which means that _ALL_ CPUs would be considered runnable - letting all vCPUs poll all the time for transactional like workload, even if one vCPU would be enough. This can result in huge CPU usage for large guests. This patch lets architectures provide a way to qualify wakeups if they should be considered a good/bad wakeups in regard to polls. For s390 the implementation will fence of halt polling for anything but known good, single vCPU events. The s390 implementation for floating interrupts does a wakeup for one vCPU, but the interrupt will be delivered by whatever CPU checks first for a pending interrupt. We prefer the woken up CPU by marking the poll of this CPU as "good" poll. This code will also mark several other wakeup reasons like IPI or expired timers as "good". This will of course also mark some events as not sucessful. As KVM on z runs always as a 2nd level hypervisor, we prefer to not poll, unless we are really sure, though. This patch successfully limits the CPU usage for cases like uperf 1byte transactional ping pong workload or wakeup heavy workload like OLTP while still providing a proper speedup. This also introduced a new vcpu stat "halt_poll_no_tuning" that marks wakeups that are considered not good for polling. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Radim Krčmář <rkrcmar@redhat.com> (for an earlier version) Cc: David Matlack <dmatlack@google.com> Cc: Wanpeng Li <kernellwp@gmail.com> [Rename config symbol. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-12KVM: PPC: Book3S HV: Re-enable XICS fast path for irqfd-generated interruptsPaul Mackerras
Commit c9a5eccac1ab ("kvm/eventfd: add arch-specific set_irq", 2015-10-16) added the possibility for architecture-specific code to handle the generation of virtual interrupts in atomic context where possible, without having to schedule a work function. Since we can easily generate virtual interrupts on XICS without having to do anything worse than take a spinlock, we define a kvm_arch_set_irq_inatomic() for XICS. We also remove kvm_set_msi() since it is not used any more. The one slightly tricky thing is that with the new interface, we don't get told whether the interrupt is an MSI (or other edge sensitive interrupt) vs. level-sensitive. The difference as far as interrupt generation is concerned is that for LSIs we have to set the asserted flag so it will continue to fire until it is explicitly cleared. In fact the XICS code gets told which interrupts are LSIs by userspace when it configures the interrupt via the KVM_DEV_XICS_GRP_SOURCES attribute group on the XICS device. To store this information, we add a new "lsi" field to struct ics_irq_state. With that we can also do a better job of returning accurate values when reading the attribute group. Signed-off-by: Paul Mackerras <paulus@samba.org>
2016-05-11KVM: PPC: Book3S HV: Fix build error in book3s_hv.cGavin Shan
When CONFIG_KVM_XICS is enabled, CPU_UP_PREPARE and other macros for CPU states in linux/cpu.h are needed by arch/powerpc/kvm/book3s_hv.c. Otherwise, build error as below is seen: gwshan@gwshan:~/sandbox/l$ make arch/powerpc/kvm/book3s_hv.o : CC arch/powerpc/kvm/book3s_hv.o arch/powerpc/kvm/book3s_hv.c: In function ‘kvmppc_cpu_notify’: arch/powerpc/kvm/book3s_hv.c:3072:7: error: ‘CPU_UP_PREPARE’ \ undeclared (first use in this function) This fixes the issue introduced by commit <6f3bb80944> ("KVM: PPC: Book3S HV: kvmppc_host_rm_ops - handle offlining CPUs"). Fixes: 6f3bb8094414 Cc: stable@vger.kernel.org # v4.6 Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2016-05-11KVM: PPC: Fix emulated MMIO sign-extensionPaul Mackerras
When the guest does a sign-extending load instruction (such as lha or lwa) to an emulated MMIO location, it results in a call to kvmppc_handle_loads() in the host. That function sets the vcpu->arch.mmio_sign_extend flag and calls kvmppc_handle_load() to do the rest of the work. However, kvmppc_handle_load() sets the mmio_sign_extend flag to 0 unconditionally, so the sign extension never gets done. To fix this, we rename kvmppc_handle_load to __kvmppc_handle_load and add an explicit parameter to indicate whether sign extension is required. kvmppc_handle_load() and kvmppc_handle_loads() then become 1-line functions that just call __kvmppc_handle_load() with the extra parameter. Reported-by: Bin Lu <lblulb@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2016-05-11KVM: PPC: Fix debug macrosAlexey Kardashevskiy
When XICS_DBG is enabled, gcc produces format errors. This fixes formats to match passed values types. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2016-05-11KVM: PPC: Book3S PR: Manage single-step modeLaurent Vivier
Until now, when we connect gdb to the QEMU gdb-server, the single-step mode is not managed. This patch adds this, only for kvm-pr: If KVM_GUESTDBG_SINGLESTEP is set, we enable single-step trace bit in the MSR (MSR_SE) just before the __kvmppc_vcpu_run(), and disable it just after. In kvmppc_handle_exit_pr, instead of routing the interrupt to the guest, we return to host, with KVM_EXIT_DEBUG reason. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2016-05-01powerpc/mm/hash: Add support for Power9 HashAneesh Kumar K.V
PowerISA 3.0 adds a parition table indexed by LPID. Parition table allows us to specify the MMU model that will be used for guest and host translation. This patch adds support with SLB based hash model (UPRT = 0). What is required with this model is to support the new hash page table entry format and also setup partition table such that we use hash table for address translation. We don't have segment table support yet. In order to make sure we don't load KVM module on Power9 (since we don't have kvm support yet) this patch also disables KVM on Power9. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-01powerpc/mm: Drop WIMG in favour of new constantsAneesh Kumar K.V
PowerISA 3.0 introduces two pte bits with the below meaning for radix: 00 -> Normal Memory 01 -> Strong Access Order (SAO) 10 -> Non idempotent I/O (Cache inhibited and guarded) 11 -> Tolerant I/O (Cache inhibited) We drop the existing WIMG bits in the Linux page table in favour of the above constants. We loose _PAGE_WRITETHRU with this conversion. We only use writethru via pgprot_cached_wthru() which is used by fbdev/controlfb.c which is Apple control display and also PPC32. With respect to _PAGE_COHERENCE, we have been marking hpte always coherent for some time now. htab_convert_pte_flags() always added HPTE_R_M. NOTE: KVM changes need closer review. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-22KVM: PPC: do not compile in vfio.o unconditionallyPaolo Bonzini
Build on 32-bit PPC fails with the following error: int kvm_vfio_ops_init(void) ^ In file included from arch/powerpc/kvm/../../../virt/kvm/vfio.c:21:0: arch/powerpc/kvm/../../../virt/kvm/vfio.h:8:90: note: previous definition of ‘kvm_vfio_ops_init’ was here arch/powerpc/kvm/../../../virt/kvm/vfio.c:292:6: error: redefinition of ‘kvm_vfio_ops_exit’ void kvm_vfio_ops_exit(void) ^ In file included from arch/powerpc/kvm/../../../virt/kvm/vfio.c:21:0: arch/powerpc/kvm/../../../virt/kvm/vfio.h:12:91: note: previous definition of ‘kvm_vfio_ops_exit’ was here scripts/Makefile.build:258: recipe for target arch/powerpc/kvm/../../../virt/kvm/vfio.o failed make[3]: *** [arch/powerpc/kvm/../../../virt/kvm/vfio.o] Error 1 Check whether CONFIG_KVM_VFIO is set before including vfio.o in the build. Reported-by: Pranith Kumar <bobby.prani@gmail.com> Tested-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22KVM/PPC: update the comment of memory barrier in the kvmppc_prepare_to_enter()Lan Tianyu
The barrier also orders the write to mode from any reads to the page tables done and so update the comment. Signed-off-by: Lan Tianyu <tianyu.lan@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22KVM: PPC: Create a virtual-mode only TCE table handlersAlexey Kardashevskiy
Upcoming in-kernel VFIO acceleration needs different handling in real and virtual modes which makes it hard to support both modes in the same handler. This creates a copy of kvmppc_rm_h_stuff_tce and kvmppc_rm_h_put_tce in addition to the existing kvmppc_rm_h_put_tce_indirect. This also fixes linker breakage when only PR KVM was selected (leaving HV KVM off): the kvmppc_h_put_tce/kvmppc_h_stuff_tce functions would not compile at all and the linked would fail. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-19Merge tag 'powerpc-4.6-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "This was delayed a day or two by some build-breakage on old toolchains which we've now fixed. There's two PCI commits both acked by Bjorn. There's one commit to mm/hugepage.c which is (co)authored by Kirill. Highlights: - Restructure Linux PTE on Book3S/64 to Radix format from Paul Mackerras - Book3s 64 MMU cleanup in preparation for Radix MMU from Aneesh Kumar K.V - Add POWER9 cputable entry from Michael Neuling - FPU/Altivec/VSX save/restore optimisations from Cyril Bur - Add support for new ftrace ABI on ppc64le from Torsten Duwe Various cleanups & minor fixes from: - Adam Buchbinder, Andrew Donnellan, Balbir Singh, Christophe Leroy, Cyril Bur, Luis Henriques, Madhavan Srinivasan, Pan Xinhui, Russell Currey, Sukadev Bhattiprolu, Suraj Jitindar Singh. General: - atomics: Allow architectures to define their own __atomic_op_* helpers from Boqun Feng - Implement atomic{, 64}_*_return_* variants and acquire/release/ relaxed variants for (cmp)xchg from Boqun Feng - Add powernv_defconfig from Jeremy Kerr - Fix BUG_ON() reporting in real mode from Balbir Singh - Add xmon command to dump OPAL msglog from Andrew Donnellan - Add xmon command to dump process/task similar to ps(1) from Douglas Miller - Clean up memory hotplug failure paths from David Gibson pci/eeh: - Redesign SR-IOV on PowerNV to give absolute isolation between VFs from Wei Yang. - EEH Support for SRIOV VFs from Wei Yang and Gavin Shan. - PCI/IOV: Rename and export virtfn_{add, remove} from Wei Yang - PCI: Add pcibios_bus_add_device() weak function from Wei Yang - MAINTAINERS: Update EEH details and maintainership from Russell Currey cxl: - Support added to the CXL driver for running on both bare-metal and hypervisor systems, from Christophe Lombard and Frederic Barrat. - Ignore probes for virtual afu pci devices from Vaibhav Jain perf: - Export Power8 generic and cache events to sysfs from Sukadev Bhattiprolu - hv-24x7: Fix usage with chip events, display change in counter values, display domain indices in sysfs, eliminate domain suffix in event names, from Sukadev Bhattiprolu Freescale: - Updates from Scott: "Highlights include 8xx optimizations, 32-bit checksum optimizations, 86xx consolidation, e5500/e6500 cpu hotplug, more fman and other dt bits, and minor fixes/cleanup" * tag 'powerpc-4.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (179 commits) powerpc: Fix unrecoverable SLB miss during restore_math() powerpc/8xx: Fix do_mtspr_cpu6() build on older compilers powerpc/rcpm: Fix build break when SMP=n powerpc/book3e-64: Use hardcoded mttmr opcode powerpc/fsl/dts: Add "jedec,spi-nor" flash compatible powerpc/T104xRDB: add tdm riser card node to device tree powerpc32: PAGE_EXEC required for inittext powerpc/mpc85xx: Add pcsphy nodes to FManV3 device tree powerpc/mpc85xx: Add MDIO bus muxing support to the board device tree(s) powerpc/86xx: Introduce and use common dtsi powerpc/86xx: Update device tree powerpc/86xx: Move dts files to fsl directory powerpc/86xx: Switch to kconfig fragments approach powerpc/86xx: Update defconfigs powerpc/86xx: Consolidate common platform code powerpc32: Remove one insn in mulhdu powerpc32: small optimisation in flush_icache_range() powerpc: Simplify test in __dma_sync() powerpc32: move xxxxx_dcache_range() functions inline powerpc32: Remove clear_pages() and define clear_page() inline ...
2016-03-16Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "One of the largest releases for KVM... Hardly any generic changes, but lots of architecture-specific updates. ARM: - VHE support so that we can run the kernel at EL2 on ARMv8.1 systems - PMU support for guests - 32bit world switch rewritten in C - various optimizations to the vgic save/restore code. PPC: - enabled KVM-VFIO integration ("VFIO device") - optimizations to speed up IPIs between vcpus - in-kernel handling of IOMMU hypercalls - support for dynamic DMA windows (DDW). s390: - provide the floating point registers via sync regs; - separated instruction vs. data accesses - dirty log improvements for huge guests - bugfixes and documentation improvements. x86: - Hyper-V VMBus hypercall userspace exit - alternative implementation of lowest-priority interrupts using vector hashing (for better VT-d posted interrupt support) - fixed guest debugging with nested virtualizations - improved interrupt tracking in the in-kernel IOAPIC - generic infrastructure for tracking writes to guest memory - currently its only use is to speedup the legacy shadow paging (pre-EPT) case, but in the future it will be used for virtual GPUs as well - much cleanup (LAPIC, kvmclock, MMU, PIT), including ubsan fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (217 commits) KVM: x86: remove eager_fpu field of struct kvm_vcpu_arch KVM: x86: disable MPX if host did not enable MPX XSAVE features arm64: KVM: vgic-v3: Only wipe LRs on vcpu exit arm64: KVM: vgic-v3: Reset LRs at boot time arm64: KVM: vgic-v3: Do not save an LR known to be empty arm64: KVM: vgic-v3: Save maintenance interrupt state only if required arm64: KVM: vgic-v3: Avoid accessing ICH registers KVM: arm/arm64: vgic-v2: Make GICD_SGIR quicker to hit KVM: arm/arm64: vgic-v2: Only wipe LRs on vcpu exit KVM: arm/arm64: vgic-v2: Reset LRs at boot time KVM: arm/arm64: vgic-v2: Do not save an LR known to be empty KVM: arm/arm64: vgic-v2: Move GICH_ELRSR saving to its own function KVM: arm/arm64: vgic-v2: Save maintenance interrupt state only if required KVM: arm/arm64: vgic-v2: Avoid accessing GICH registers KVM: s390: allocate only one DMA page per VM KVM: s390: enable STFLE interpretation only if enabled for the guest KVM: s390: wake up when the VCPU cpu timer expires KVM: s390: step the VCPU timer while in enabled wait KVM: s390: protect VCPU cpu timer with a seqcount KVM: s390: step VCPU cpu timer during kvm_run ioctl ...
2016-03-15Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The main changes in this cycle are: - Make schedstats a runtime tunable (disabled by default) and optimize it via static keys. As most distributions enable CONFIG_SCHEDSTATS=y due to its instrumentation value, this is a nice performance enhancement. (Mel Gorman) - Implement 'simple waitqueues' (swait): these are just pure waitqueues without any of the more complex features of full-blown waitqueues (callbacks, wake flags, wake keys, etc.). Simple waitqueues have less memory overhead and are faster. Use simple waitqueues in the RCU code (in 4 different places) and for handling KVM vCPU wakeups. (Peter Zijlstra, Daniel Wagner, Thomas Gleixner, Paul Gortmaker, Marcelo Tosatti) - sched/numa enhancements (Rik van Riel) - NOHZ performance enhancements (Rik van Riel) - Various sched/deadline enhancements (Steven Rostedt) - Various fixes (Peter Zijlstra) - ... and a number of other fixes, cleanups and smaller enhancements" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (29 commits) sched/cputime: Fix steal_account_process_tick() to always return jiffies sched/deadline: Remove dl_new from struct sched_dl_entity Revert "kbuild: Add option to turn incompatible pointer check into error" sched/deadline: Remove superfluous call to switched_to_dl() sched/debug: Fix preempt_disable_ip recording for preempt_disable() sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity time, acct: Drop irq save & restore from __acct_update_integrals() acct, time: Change indentation in __acct_update_integrals() sched, time: Remove non-power-of-two divides from __acct_update_integrals() sched/rt: Kick RT bandwidth timer immediately on start up sched/debug: Add deadline scheduler bandwidth ratio to /proc/sched_debug sched/debug: Move sched_domain_sysctl to debug.c sched/debug: Move the /sys/kernel/debug/sched_features file setup into debug.c sched/rt: Fix PI handling vs. sched_setscheduler() sched/core: Remove duplicated sched_group_set_shares() prototype sched/fair: Consolidate nohz CPU load update code sched/fair: Avoid using decay_load_missed() with a negative value sched/deadline: Always calculate end of period on sched_yield() sched/cgroup: Fix cgroup entity load tracking tear-down rcu: Use simple wait queues where possible in rcutree ...
2016-03-08KVM: PPC: Book3S HV: Sanitize special-purpose register values on guest exitPaul Mackerras
Thomas Huth discovered that a guest could cause a hard hang of a host CPU by setting the Instruction Authority Mask Register (IAMR) to a suitable value. It turns out that this is because when the code was added to context-switch the new special-purpose registers (SPRs) that were added in POWER8, we forgot to add code to ensure that they were restored to a sane value on guest exit. This adds code to set those registers where a bad value could compromise the execution of the host kernel to a suitable neutral value on guest exit. Cc: stable@vger.kernel.org # v3.14+ Fixes: b005255e12a3 Reported-by: Thomas Huth <thuth@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2016-03-03powerpc/mm: Move hash related mmu-*.h headers to book3s/Aneesh Kumar K.V
No code changes. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-01KVM: PPC: Add support for 64bit TCE windowsAlexey Kardashevskiy
The existing KVM_CREATE_SPAPR_TCE only supports 32bit windows which is not enough for directly mapped windows as the guest can get more than 4GB. This adds KVM_CREATE_SPAPR_TCE_64 ioctl and advertises it via KVM_CAP_SPAPR_TCE_64 capability. The table size is checked against the locked memory limit. Since 64bit windows are to support Dynamic DMA windows (DDW), let's add @bus_offset and @page_shift which are also required by DDW. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paul Mackerras <paulus@samba.org>
2016-03-01KVM: PPC: Add @offset to kvmppc_spapr_tce_tableAlexey Kardashevskiy
This enables userspace view of TCE tables to start from non-zero offset on a bus. This will be used for huge DMA windows. This only changes the internal structure, the user interface needs to change in order to use an offset. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2016-03-01KVM: PPC: Add @page_shift to kvmppc_spapr_tce_tableAlexey Kardashevskiy
At the moment the kvmppc_spapr_tce_table struct can only describe 4GB windows and handle fixed size (4K) pages. Dynamic DMA windows support more so these limits need to be extended. This replaces window_size (in bytes, 4GB max) with page_shift (32bit) and size (64bit, in pages). This should cause no behavioural change as this is changing the internal structures only - the user interface still only allows one to create a 32-bit table with 4KiB pages at this stage. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2016-03-01powerpc: Fix misspellings in comments.Adam Buchbinder
Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-02-29KVM: PPC: Book3S HV: Add tunable to control H_IPI redirectionSuresh E. Warrier
Redirecting the wakeup of a VCPU from the H_IPI hypercall to a core running in the host is usually a good idea, most workloads seemed to benefit. However, in one heavily interrupt-driven SMT1 workload, some regression was observed. This patch adds a kvm_hv module parameter called h_ipi_redirect to control this feature. The default value for this tunable is 1 - that is enable the feature. Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2016-02-29KVM: PPC: Book3S HV: Send IPI to host core to wake VCPUSuresh E. Warrier
This patch adds support to real-mode KVM to search for a core running in the host partition and send it an IPI message with VCPU to be woken. This avoids having to switch to the host partition to complete an H_IPI hypercall when the VCPU which is the target of the the H_IPI is not loaded (is not running in the guest). The patch also includes the support in the IPI handler running in the host to do the wakeup by calling kvmppc_xics_ipi_action for the PPC_MSG_RM_HOST_ACTION message. When a guest is being destroyed, we need to ensure that there are no pending IPIs waiting to wake up a VCPU before we free the VCPUs of the guest. This is accomplished by: - Forces a PPC_MSG_CALL_FUNCTION IPI to be completed by all CPUs before freeing any VCPUs in kvm_arch_destroy_vm(). - Any PPC_MSG_RM_HOST_ACTION messages must be executed first before any other PPC_MSG_CALL_FUNCTION messages. Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2016-02-29KVM: PPC: Book3S HV: Host side kick VCPU when poked by real-mode KVMSuresh Warrier
This patch adds the support for the kick VCPU operation for kvmppc_host_rm_ops. The kvmppc_xics_ipi_action() function provides the function to be invoked for a host side operation when poked by the real mode KVM. This is initiated by KVM by sending an IPI to any free host core. KVM real mode must set the rm_action to XICS_RM_KICK_VCPU and rm_data to point to the VCPU to be woken up before sending the IPI. Note that we have allocated one kvmppc_host_rm_core structure per core. The above values need to be set in the structure corresponding to the core to which the IPI will be sent. Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2016-02-29KVM: PPC: Book3S HV: kvmppc_host_rm_ops - handle offlining CPUsSuresh Warrier
The kvmppc_host_rm_ops structure keeps track of which cores are are in the host by maintaining a bitmask of active/runnable online CPUs that have not entered the guest. This patch adds support to manage the bitmask when a CPU is offlined or onlined in the host. Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2016-02-29KVM: PPC: Book3S HV: Manage core host stateSuresh Warrier
Update the core host state in kvmppc_host_rm_ops whenever the primary thread of the core enters the guest or returns back. Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2016-02-29KVM: PPC: Book3S HV: Host-side RM data structuresSuresh Warrier
This patch defines the data structures to support the setting up of host side operations while running in real mode in the guest, and also the functions to allocate and free it. The operations are for now limited to virtual XICS operations. Currently, we have only defined one operation in the data structure: - Wake up a VCPU sleeping in the host when it receives a virtual interrupt The operations are assigned at the core level because PowerKVM requires that the host run in SMT off mode. For each core, we will need to manage its state atomically - where the state is defined by: 1. Is the core running in the host? 2. Is there a Real Mode (RM) operation pending on the host? Currently, core state is only managed at the whole-core level even when the system is in split-core mode. This just limits the number of free or "available" cores in the host to perform any host-side operations. The kvmppc_host_rm_core.rm_data allows any data to be passed by KVM in real mode to the host core along with the operation to be performed. The kvmppc_host_rm_ops structure is allocated the very first time a guest VM is started. Initial core state is also set - all online cores are in the host. This structure is never deleted, not even when there are no active guests. However, it needs to be freed when the module is unloaded because the kvmppc_host_rm_ops_hv can contain function pointers to kvm-hv.ko functions for the different supported host operations. Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2016-02-25KVM: Use simple waitqueue for vcpu->wqMarcelo Tosatti
The problem: On -rt, an emulated LAPIC timer instances has the following path: 1) hard interrupt 2) ksoftirqd is scheduled 3) ksoftirqd wakes up vcpu thread 4) vcpu thread is scheduled This extra context switch introduces unnecessary latency in the LAPIC path for a KVM guest. The solution: Allow waking up vcpu thread from hardirq context, thus avoiding the need for ksoftirqd to be scheduled. Normal waitqueues make use of spinlocks, which on -RT are sleepable locks. Therefore, waking up a waitqueue waiter involves locking a sleeping lock, which is not allowed from hard interrupt context. cyclictest command line: This patch reduces the average latency in my tests from 14us to 11us. Daniel writes: Paolo asked for numbers from kvm-unit-tests/tscdeadline_latency benchmark on mainline. The test was run 1000 times on tip/sched/core 4.4.0-rc8-01134-g0905f04: ./x86-run x86/tscdeadline_latency.flat -cpu host with idle=poll. The test seems not to deliver really stable numbers though most of them are smaller. Paolo write: "Anything above ~10000 cycles means that the host went to C1 or lower---the number means more or less nothing in that case. The mean shows an improvement indeed." Before: min max mean std count 1000.000000 1000.000000 1000.000000 1000.000000 mean 5162.596000 2019270.084000 5824.491541 20681.645558 std 75.431231 622607.723969 89.575700 6492.272062 min 4466.000000 23928.000000 5537.926500 585.864966 25% 5163.000000 1613252.750000 5790.132275 16683.745433 50% 5175.000000 2281919.000000 5834.654000 23151.990026 75% 5190.000000 2382865.750000 5861.412950 24148.206168 max 5228.000000 4175158.000000 6254.827300 46481.048691 After min max mean std count 1000.000000 1000.00000 1000.000000 1000.000000 mean 5143.511000 2076886.10300 5813.312474 21207.357565 std 77.668322 610413.09583 86.541500 6331.915127 min 4427.000000 25103.00000 5529.756600 559.187707 25% 5148.000000 1691272.75000 5784.889825 17473.518244 50% 5160.000000 2308328.50000 5832.025000 23464.837068 75% 5172.000000 2393037.75000 5853.177675 24223.969976 max 5222.000000 3922458.00000 6186.720500 42520.379830 [Patch was originaly based on the swait implementation found in the -rt tree. Daniel ported it to mainline's version and gathered the benchmark numbers for tscdeadline_latency test.] Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: linux-rt-users@vger.kernel.org Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1455871601-27484-4-git-send-email-wagi@monom.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-02-16KVM: PPC: Add support for multiple-TCE hcallsAlexey Kardashevskiy
This adds real and virtual mode handlers for the H_PUT_TCE_INDIRECT and H_STUFF_TCE hypercalls for user space emulated devices such as IBMVIO devices or emulated PCI. These calls allow adding multiple entries (up to 512) into the TCE table in one call which saves time on transition between kernel and user space. The current implementation of kvmppc_h_stuff_tce() allows it to be executed in both real and virtual modes so there is one helper. The kvmppc_rm_h_put_tce_indirect() needs to translate the guest address to the host address and since the translation is different, there are 2 helpers - one for each mode. This implements the KVM_CAP_PPC_MULTITCE capability. When present, the kernel will try handling H_PUT_TCE_INDIRECT and H_STUFF_TCE if these are enabled by the userspace via KVM_CAP_PPC_ENABLE_HCALL. If they can not be handled by the kernel, they are passed on to the user space. The user space still has to have an implementation for these. Both HV and PR-syle KVM are supported. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2016-02-16KVM: PPC: Move reusable bits of H_PUT_TCE handler to helpersAlexey Kardashevskiy
Upcoming multi-tce support (H_PUT_TCE_INDIRECT/H_STUFF_TCE hypercalls) will validate TCE (not to have unexpected bits) and IO address (to be within the DMA window boundaries). This introduces helpers to validate TCE and IO address. The helpers are exported as they compile into vmlinux (to work in realmode) and will be used later by KVM kernel module in virtual mode. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2016-02-16KVM: PPC: Replace SPAPR_TCE_SHIFT with IOMMU_PAGE_SHIFT_4KAlexey Kardashevskiy
SPAPR_TCE_SHIFT is used in few places only and since IOMMU_PAGE_SHIFT_4K can be easily used instead, remove SPAPR_TCE_SHIFT. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2016-02-16KVM: PPC: Account TCE-containing pages in locked_vmAlexey Kardashevskiy
At the moment pages used for TCE tables (in addition to pages addressed by TCEs) are not counted in locked_vm counter so a malicious userspace tool can call ioctl(KVM_CREATE_SPAPR_TCE) as many times as RLIMIT_NOFILE and lock a lot of memory. This adds counting for pages used for TCE tables. This counts the number of pages required for a table plus pages for the kvmppc_spapr_tce_table struct (TCE table descriptor) itself. This changes release_spapr_tce_table() to store @npages on stack to avoid calling kvmppc_stt_npages() in the loop (tiny optimization, probably). This does not change the amount of used memory. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2016-02-16KVM: PPC: Use RCU for arch.spapr_tce_tablesAlexey Kardashevskiy
At the moment only spapr_tce_tables updates are protected against races but not lookups. This fixes missing protection by using RCU for the list. As lookups also happen in real mode, this uses list_for_each_entry_lockless() (which is expected not to access any vmalloc'd memory). This converts release_spapr_tce_table() to a RCU scheduled handler. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2016-02-16KVM: PPC: Rework H_PUT_TCE/H_GET_TCE handlersAlexey Kardashevskiy
This reworks the existing H_PUT_TCE/H_GET_TCE handlers to have following patches applied nicer. This moves the ioba boundaries check to a helper and adds a check for least bits which have to be zeros. The patch is pretty mechanical (only check for least ioba bits is added) so no change in behaviour is expected. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2016-02-15vfio: Enable VFIO device for powerpcDavid Gibson
ec53500f "kvm: Add VFIO device" added a special KVM pseudo-device which is used to handle any necessary interactions between KVM and VFIO. Currently that device is built on x86 and ARM, but not powerpc, although powerpc does support both KVM and VFIO. This makes things awkward in userspace Currently qemu prints an alarming error message if you attempt to use VFIO and it can't initialize the KVM VFIO device. We don't want to remove the warning, because lack of the KVM VFIO device could mean coherency problems on x86. On powerpc, however, the error is harmless but looks disturbing, and a test based on host architecture in qemu would be ugly, and break if we do need the KVM VFIO device for something important in future. There's nothing preventing the KVM VFIO device from being built for powerpc, so this patch turns it on. It won't actually do anything, since we don't define any of the arch_*() hooks, but it will make qemu happy and we can extend it in future if we need to. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2016-01-27Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Paolo Bonzini: "s390 and POWER bug fixes, plus enabling the KVM-VFIO interface on s390" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM doc: Fix KVM_SMI chapter number KVM: s390: fix memory overwrites when vx is disabled KVM: s390: Enable the KVM-VFIO device KVM: s390: fix guest fprs memory leak KVM: PPC: Fix ONE_REG AltiVec support KVM: PPC: Increase memslots to 512 KVM: PPC: Book3S PR: Remove unused variable 'vcpu_book3s' KVM: PPC: Fix emulation of H_SET_DABR/X on POWER8 KVM: PPC: Book3S HV: Handle unexpected traps in guest entry/exit code better
2016-01-26Merge tag 'kvm-s390-master-4.5-1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Fixes for kvm/master (targeting 4.5) 1. Fallout of some bigger floating point/vector rework in s390 - memory leak -> stable 4.3+ - memory overwrite -> stable 4.4+ 2. enable KVM-VFIO for s390
2016-01-16kvm: rename pfn_t to kvm_pfn_tDan Williams
To date, we have implemented two I/O usage models for persistent memory, PMEM (a persistent "ram disk") and DAX (mmap persistent memory into userspace). This series adds a third, DAX-GUP, that allows DAX mappings to be the target of direct-i/o. It allows userspace to coordinate DMA/RDMA from/to persistent memory. The implementation leverages the ZONE_DEVICE mm-zone that went into 4.3-rc1 (also discussed at kernel summit) to flag pages that are owned and dynamically mapped by a device driver. The pmem driver, after mapping a persistent memory range into the system memmap via devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus page-backed pmem-pfns via flags in the new pfn_t type. The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the resulting pte(s) inserted into the process page tables with a new _PAGE_DEVMAP flag. Later, when get_user_pages() is walking ptes it keys off _PAGE_DEVMAP to pin the device hosting the page range active. Finally, get_page() and put_page() are modified to take references against the device driver established page mapping. Finally, this need for "struct page" for persistent memory requires memory capacity to store the memmap array. Given the memmap array for a large pool of persistent may exhaust available DRAM introduce a mechanism to allocate the memmap from persistent memory. The new "struct vmem_altmap *" parameter to devm_memremap_pages() enables arch_add_memory() to use reserved pmem capacity rather than the page allocator. This patch (of 18): The core has developed a need for a "pfn_t" type [1]. Move the existing pfn_t in KVM to kvm_pfn_t [2]. [1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html [2]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002218.html Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-15Merge tag 'powerpc-4.5-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Core: - Ground work for the new Power9 MMU from Aneesh Kumar K.V - Optimise FP/VMX/VSX context switching from Anton Blanchard Misc: - Various cleanups from Krzysztof Kozlowski, John Ogness, Rashmica Gupta, Russell Currey, Gavin Shan, Daniel Axtens, Michael Neuling, Andrew Donnellan - Allow wrapper to work on non-english system from Laurent Vivier - Add rN aliases to the pt_regs_offset table from Rashmica Gupta - Fix module autoload for rackmeter & axonram drivers from Luis de Bethencourt - Include KVM guest test in all interrupt vectors from Paul Mackerras - Fix DSCR inheritance over fork() from Anton Blanchard - Make value-returning atomics & {cmp}xchg* & their atomic_ versions fully ordered from Boqun Feng - Print MSR TM bits in oops messages from Michael Neuling - Add TM signal return & invalid stack selftests from Michael Neuling - Limit EPOW reset event warnings from Vipin K Parashar - Remove the Cell QPACE code from Rashmica Gupta - Append linux_banner to exception information in xmon from Rashmica Gupta - Add selftest to check if VSRs are corrupted from Rashmica Gupta - Remove broken GregorianDay() from Daniel Axtens - Import Anton's context_switch2 benchmark into selftests from Michael Ellerman - Add selftest script to test HMI functionality from Daniel Axtens - Remove obsolete OPAL v2 support from Stewart Smith - Make enter_rtas() private from Michael Ellerman - PPR exception cleanups from Michael Ellerman - Add page soft dirty tracking from Laurent Dufour - Add support for Nvlink NPUs from Alistair Popple - Add support for kexec on 476fpe from Alistair Popple - Enable kernel CPU dlpar from sysfs from Nathan Fontenot - Copy only required pieces of the mm_context_t to the paca from Michael Neuling - Add a kmsg_dumper that flushes OPAL console output on panic from Russell Currey - Implement save_stack_trace_regs() to enable kprobe stack tracing from Steven Rostedt - Add HWCAP bits for Power9 from Michael Ellerman - Fix _PAGE_PTE breaking swapoff from Aneesh Kumar K.V - Fix _PAGE_SWP_SOFT_DIRTY breaking swapoff from Hugh Dickins - scripts/recordmcount.pl: support data in text section on powerpc from Ulrich Weigand - Handle R_PPC64_ENTRY relocations in modules from Ulrich Weigand cxl: - cxl: Fix possible idr warning when contexts are released from Vaibhav Jain - cxl: use correct operator when writing pcie config space values from Andrew Donnellan - cxl: Fix DSI misses when the context owning task exits from Vaibhav Jain - cxl: fix build for GCC 4.6.x from Brian Norris - cxl: use -Werror only with CONFIG_PPC_WERROR from Brian Norris - cxl: Enable PCI device ID for future IBM CXL adapter from Uma Krishnan Freescale: - Freescale updates from Scott: Highlights include moving QE code out of arch/powerpc (to be shared with arm), device tree updates, and minor fixes" * tag 'powerpc-4.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (149 commits) powerpc/module: Handle R_PPC64_ENTRY relocations scripts/recordmcount.pl: support data in text section on powerpc powerpc/powernv: Fix OPAL_CONSOLE_FLUSH prototype and usages powerpc/mm: fix _PAGE_SWP_SOFT_DIRTY breaking swapoff powerpc/mm: Fix _PAGE_PTE breaking swapoff cxl: Enable PCI device ID for future IBM CXL adapter cxl: use -Werror only with CONFIG_PPC_WERROR cxl: fix build for GCC 4.6.x powerpc: Add HWCAP bits for Power9 powerpc/powernv: Reserve PE#0 on NPU powerpc/powernv: Change NPU PE# assignment powerpc/powernv: Fix update of NVLink DMA mask powerpc/powernv: Remove misleading comment in pci.c powerpc: Implement save_stack_trace_regs() to enable kprobe stack tracing powerpc: Fix build break due to paca mm_context_t changes cxl: Fix DSI misses when the context owning task exits MAINTAINERS: Update Scott Wood's e-mail address powerpc/powernv: Fix minor off-by-one error in opal_mce_check_early_recovery() powerpc: Fix style of self-test config prompts powerpc/powernv: Only delay opal_rtc_read() retry when necessary ...
2016-01-14KVM: PPC: Fix ONE_REG AltiVec supportGreg Kurz
The get and set operations got exchanged by mistake when moving the code from book3s.c to powerpc.c. Fixes: 3840edc8033ad5b86deee309c1c321ca54257452 Cc: stable@vger.kernel.org # 3.18+ Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2016-01-12Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "PPC changes will come next week. - s390: Support for runtime instrumentation within guests, support of 248 VCPUs. - ARM: rewrite of the arm64 world switch in C, support for 16-bit VM identifiers. Performance counter virtualization missed the boat. - x86: Support for more Hyper-V features (synthetic interrupt controller), MMU cleanups" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (115 commits) kvm: x86: Fix vmwrite to SECONDARY_VM_EXEC_CONTROL kvm/x86: Hyper-V SynIC timers tracepoints kvm/x86: Hyper-V SynIC tracepoints kvm/x86: Update SynIC timers on guest entry only kvm/x86: Skip SynIC vector check for QEMU side kvm/x86: Hyper-V fix SynIC timer disabling condition kvm/x86: Reorg stimer_expiration() to better control timer restart kvm/x86: Hyper-V unify stimer_start() and stimer_restart() kvm/x86: Drop stimer_stop() function kvm/x86: Hyper-V timers fix incorrect logical operation KVM: move architecture-dependent requests to arch/ KVM: renumber vcpu->request bits KVM: document which architecture uses each request bit KVM: Remove unused KVM_REQ_KICK to save a bit in vcpu->requests kvm: x86: Check kvm_write_guest return value in kvm_write_wall_clock KVM: s390: implement the RI support of guest kvm/s390: drop unpaired smp_mb kvm: x86: fix comment about {mmu,nested_mmu}.gva_to_gpa KVM: x86: MMU: Use clear_page() instead of init_shadow_page_table() arm/arm64: KVM: Detect vGIC presence at runtime ...
2015-12-10KVM: PPC: Book3S HV: Prohibit setting illegal transaction state in MSRPaul Mackerras
Currently it is possible for userspace (e.g. QEMU) to set a value for the MSR for a guest VCPU which has both of the TS bits set, which is an illegal combination. The result of this is that when we execute a hrfid (hypervisor return from interrupt doubleword) instruction to enter the guest, the CPU will take a TM Bad Thing type of program interrupt (vector 0x700). Now, if PR KVM is configured in the kernel along with HV KVM, we actually handle this without crashing the host or giving hypervisor privilege to the guest; instead what happens is that we deliver a program interrupt to the guest, with SRR0 reflecting the address of the hrfid instruction and SRR1 containing the MSR value at that point. If PR KVM is not configured in the kernel, then we try to run the host's program interrupt handler with the MMU set to the guest context, which almost certainly causes a host crash. This closes the hole by making kvmppc_set_msr_hv() check for the illegal combination and force the TS field to a safe value (00, meaning non-transactional). Cc: stable@vger.kernel.org # v3.9+ Signed-off-by: Paul Mackerras <paulus@samba.org>
2015-12-09KVM: PPC: Book3S PR: Remove unused variable 'vcpu_book3s'Geyslan G. Bem
The vcpu_book3s variable is assigned but never used. So remove it. Found using cppcheck. Signed-off-by: Geyslan G. Bem <geyslan@gmail.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2015-12-09KVM: PPC: Fix emulation of H_SET_DABR/X on POWER8Thomas Huth
In the old DABR register, the BT (Breakpoint Translation) bit is bit number 61. In the new DAWRX register, the WT (Watchpoint Translation) bit is bit number 59. So to move the DABR-BT bit into the position of the DAWRX-WT bit, it has to be shifted by two, not only by one. This fixes hardware watchpoints in gdb of older guests that only use the H_SET_DABR/X interface instead of the new H_SET_MODE interface. Cc: stable@vger.kernel.org # v3.14+ Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2015-12-09KVM: PPC: Book3S HV: Handle unexpected traps in guest entry/exit code betterPaul Mackerras
As we saw with the TM Bad Thing type of program interrupt occurring on the hrfid that enters the guest, it is not completely impossible to have a trap occurring in the guest entry/exit code, despite the fact that the code has been written to avoid taking any traps. This adds a check in the kvmppc_handle_exit_hv() function to detect the case when a trap has occurred in the hypervisor-mode code, and instead of treating it just like a trap in guest code, we now print a message and return to userspace with a KVM_EXIT_INTERNAL_ERROR exit reason. Of the various interrupts that get handled in the assembly code in the guest exit path and that can return directly to the guest, the only one that can occur when MSR.HV=1 and MSR.EE=0 is machine check (other than system call, which we can avoid just by not doing a sc instruction). Therefore this adds code to the machine check path to ensure that if the MCE occurred in hypervisor mode, we exit to the host rather than trying to continue the guest. Signed-off-by: Paul Mackerras <paulus@samba.org>
2015-12-02powerpc: create flush_all_to_thread()Anton Blanchard
Create a single function that flushes everything (FP, VMX, VSX, SPE). Doing this all at once means we only do one MSR write. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-02powerpc: create giveup_all()Anton Blanchard
Create a single function that gives everything up (FP, VMX, VSX, SPE). Doing this all at once means we only do one MSR write. A context switch microbenchmark using yield(): http://ozlabs.org/~anton/junkcode/context_switch2.c ./context_switch2 --test=yield --fp --altivec --vector 0 0 shows an improvement of 3% on POWER8. Signed-off-by: Anton Blanchard <anton@samba.org> [mpe: giveup_all() needs to be EXPORT_SYMBOL'ed] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-01powerpc: Create disable_kernel_{fp,altivec,vsx,spe}()Anton Blanchard
The enable_kernel_*() functions leave the relevant MSR bits enabled until we exit the kernel sometime later. Create disable versions that wrap the kernel use of FP, Altivec VSX or SPE. While we don't want to disable it normally for performance reasons (MSR writes are slow), it will be used for a debug boot option that does this and catches bad uses in other areas of the kernel. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-11-30KVM: Use common function for VCPU lookup by idDavid Hildenbrand
Let's reuse the new common function for VPCU lookup by id. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> [split out the new function into a separate patch]
2015-11-25KVM: powerpc: kvmppc_visible_gpa can be booleanYaowei Bai
In another patch kvm_is_visible_gfn is maken return bool due to this function only returns zero or one as its return value, let's also make kvmppc_visible_gpa return bool to keep consistent. No functional change. Signed-off-by: Yaowei Bai <baiyaowei@cmss.chinamobile.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>