summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2013-09-01Merge tag 'kvm-arm-for-3.12' of ↵Gleb Natapov
git://git.linaro.org/people/cdall/linux-kvm-arm into queue KVM/ARM Updates for Linux 3.12 * tag 'kvm-arm-for-3.12' of git://git.linaro.org/people/cdall/linux-kvm-arm: ARM: KVM: Add newlines to panic strings ARM: KVM: Work around older compiler bug ARM: KVM: Simplify tracepoint text ARM: KVM: Fix kvm_set_pte assignment
2013-08-30ARM: KVM: Add newlines to panic stringsChristoffer Dall
The panic strings are hard to read and on narrow terminals some characters are simply truncated off the panic message. Make is slightly prettier with a newline in the Hyp panic strings. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2013-08-30ARM: KVM: Work around older compiler bugChristoffer Dall
Compilers before 4.6 do not behave well with unnamed fields in structure initializers and therefore produces build errors: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=10676 By refering to the unnamed union using braces, both older and newer compilers produce the same result. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Reported-by: Russell King <linux@arm.linux.org.uk> Tested-by: Russell King <linux@arm.linux.org.uk> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2013-08-30ARM: KVM: Simplify tracepoint textChristoffer Dall
The tracepoint for kvm_guest_fault was extremely long, make it a slightly bit shorter. Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2013-08-30ARM: KVM: Fix kvm_set_pte assignmentChristoffer Dall
THe kvm_set_pte function was actually assigning the entire struct to the structure member, which should work because the structure only has that one member, but it is still not very nice. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2013-08-30ARM: KVM: vgic: Bump VGIC_NR_IRQS to 256Christoffer Dall
The Versatile Express TC2 board, which we use as our main emulated platform in QEMU, defines 160+32 == 192 interrupts, so limiting the number of interrupts to 128 is not quite going to cut it for real board emulation. Note that this didn't use to be a problem because QEMU was buggy and only defined 128 interrupts until recently. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-08-30ARM: KVM: Bugfix: vgic_bytemap_get_reg per cpu regsChristoffer Dall
For bytemaps each IRQ field is 1 byte wide, so we pack 4 irq fields in one word and since there are 32 private (per cpu) irqs, we have 8 private u32 fields on the vgic_bytemap struct. We shift the offset from the base of the register group right by 2, giving us the word index instead of the field index. But then there are 8 private words, not 4, which is also why we subtract 8 words from the offset of the shared words. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-08-30ARM: KVM: vgic: fix GICD_ICFGRn accessMarc Zyngier
All the code in handle_mmio_cfg_reg() assumes the offset has been shifted right to accomodate for the 2:1 bit compression, but this is only done when getting the register address. Shift the offset early so the code works mostly unchanged. Reported-by: Zhaobo (Bob, ERC) <zhaobo@huawei.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-08-30ARM: KVM: vgic: simplify vgic_get_target_regMarc Zyngier
vgic_get_target_reg is quite complicated, for no good reason. Actually, it is fairly easy to write it in a much more efficient way by using the target CPU array instead of the bitmap. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-08-30Merge branch 'kvm-ppc-next' of git://github.com/agraf/linux-2.6 into queueGleb Natapov
* 'kvm-ppc-next' of git://github.com/agraf/linux-2.6: KVM: PPC: Book3S PR: Rework kvmppc_mmu_book3s_64_xlate() KVM: PPC: Book3S PR: Make instruction fetch fallback work for system calls KVM: PPC: Book3S PR: Don't corrupt guest state when kernel uses VMX KVM: PPC: Book3S: Fix compile error in XICS emulation KVM: PPC: Book3S PR: return appropriate error when allocation fails arch: powerpc: kvm: add signed type cast for comparation powerpc/kvm: Copy the pvr value after memset KVM: PPC: Book3S PR: Load up SPRG3 register with guest value on guest entry kvm/ppc/booke: Don't call kvm_guest_enter twice kvm/ppc: Call trace_hardirqs_on before entry KVM: PPC: Book3S HV: Allow negative offsets to real-mode hcall handlers KVM: PPC: Book3S HV: Correct tlbie usage powerpc/kvm: Use 256K chunk to track both RMA and hash page table allocation. powerpc/kvm: Contiguous memory allocator based RMA allocation powerpc/kvm: Contiguous memory allocator based hash page table allocation KVM: PPC: Book3S: Ignore DABR register mm/cma: Move dma contiguous changes into a seperate config
2013-08-29KVM: MMU: remove unused parameterXiao Guangrong
vcpu in page_fault_can_be_fast() is not used so remove it Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-08-28Merge remote-tracking branch 'origin/next' into kvm-ppc-nextAlexander Graf
Conflicts: mm/Kconfig CMA DMA split and ZSWAP introduction were conflicting, fix up manually.
2013-08-28KVM: PPC: Book3S PR: Rework kvmppc_mmu_book3s_64_xlate()Paul Mackerras
This reworks kvmppc_mmu_book3s_64_xlate() to make it check the large page bit in the hashed page table entries (HPTEs) it looks at, and to simplify and streamline the code. The checking of the first dword of each HPTE is now done with a single mask and compare operation, and all the code dealing with the matching HPTE, if we find one, is consolidated in one place in the main line of the function flow. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-08-28KVM: PPC: Book3S PR: Make instruction fetch fallback work for system callsPaul Mackerras
It turns out that if we exit the guest due to a hcall instruction (sc 1), and the loading of the instruction in the guest exit path fails for any reason, the call to kvmppc_ld() in kvmppc_get_last_inst() fetches the instruction after the hcall instruction rather than the hcall itself. This in turn means that the instruction doesn't get recognized as an hcall in kvmppc_handle_exit_pr() but gets passed to the guest kernel as a sc instruction. That usually results in the guest kernel getting a return code of 38 (ENOSYS) from an hcall, which often triggers a BUG_ON() or other failure. This fixes the problem by adding a new variant of kvmppc_get_last_inst() called kvmppc_get_last_sc(), which fetches the instruction if necessary from pc - 4 rather than pc. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-08-28KVM: PPC: Book3S PR: Don't corrupt guest state when kernel uses VMXPaul Mackerras
Currently the code assumes that once we load up guest FP/VSX or VMX state into the CPU, it stays valid in the CPU registers until we explicitly flush it to the thread_struct. However, on POWER7, copy_page() and memcpy() can use VMX. These functions do flush the VMX state to the thread_struct before using VMX instructions, but if this happens while we have guest state in the VMX registers, and we then re-enter the guest, we don't reload the VMX state from the thread_struct, leading to guest corruption. This has been observed to cause guest processes to segfault. To fix this, we check before re-entering the guest that all of the bits corresponding to facilities owned by the guest, as expressed in vcpu->arch.guest_owned_ext, are set in current->thread.regs->msr. Any bits that have been cleared correspond to facilities that have been used by kernel code and thus flushed to the thread_struct, so for them we reload the state from the thread_struct. We also need to check current->thread.regs->msr before calling giveup_fpu() or giveup_altivec(), since if the relevant bit is clear, the state has already been flushed to the thread_struct and to flush it again would corrupt it. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-08-28KVM: x86: update masterclock when kvmclock_offset is calculated (v2)Marcelo Tosatti
The offset to add to the hosts monotonic time, kvmclock_offset, is calculated against the monotonic time at KVM_SET_CLOCK ioctl time. Request a master clock update at this time, to reduce a potentially unbounded difference between the values of the masterclock and the clock value used to calculate kvmclock_offset. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-08-28KVM: PPC: Book3S: Fix compile error in XICS emulationPaul Mackerras
Commit 8e44ddc3f3 ("powerpc/kvm/book3s: Add support for H_IPOLL and H_XIRR_X in XICS emulation") added a call to get_tb() but didn't include the header that defines it, and on some configs this means book3s_xics.c fails to compile: arch/powerpc/kvm/book3s_xics.c: In function ‘kvmppc_xics_hcall’: arch/powerpc/kvm/book3s_xics.c:812:3: error: implicit declaration of function ‘get_tb’ [-Werror=implicit-function-declaration] Cc: stable@vger.kernel.org [v3.10, v3.11] Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-08-28KVM: PPC: Book3S PR: return appropriate error when allocation failsThadeu Lima de Souza Cascardo
err was overwritten by a previous function call, and checked to be 0. If the following page allocation fails, 0 is going to be returned instead of -ENOMEM. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-08-28arch: powerpc: kvm: add signed type cast for comparationChen Gang
'rmls' is 'unsigned long', lpcr_rmls() will return negative number when failure occurs, so it need a type cast for comparing. 'lpid' is 'unsigned long', kvmppc_alloc_lpid() return negative number when failure occurs, so it need a type cast for comparing. Signed-off-by: Chen Gang <gang.chen@asianux.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-08-28KVM: x86: add comments where MMIO does not return to the emulatorPaolo Bonzini
Support for single-step in the emulator (new in 3.12) does not work for MMIO or PIO writes, because they are completed without returning to the emulator. This is not worse than what we had in 3.11; still, add comments so that the issue is not forgotten. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-08-28KVM: vmx: count exits to userspace during invalid guest emulationPaolo Bonzini
These will happen due to MMIO. Suggested-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-08-28KVM: rename __kvm_io_bus_sort_cmp to kvm_io_bus_cmpPaolo Bonzini
This is the type-safe comparison function, so the double-underscore is not related. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-08-27kvm: optimize away THP checks in kvm_is_mmio_pfn()Andrea Arcangeli
The checks on PG_reserved in the page structure on head and tail pages aren't necessary because split_huge_page wouldn't transfer the PG_reserved bit from head to tail anyway. This was a forward-thinking check done in the case PageReserved was set by a driver-owned page mapped in userland with something like remap_pfn_range in a VM_PFNMAP region, but using hugepmds (not possible right now). It was meant to be very safe, but it's overkill as it's unlikely split_huge_page could ever run without the driver noticing and tearing down the hugepage itself. And if a driver in the future will really want to map a reserved hugepage in userland using an huge pmd it should simply take care of marking all subpages reserved too to keep KVM safe. This of course would require such a hypothetical driver to tear down the huge pmd itself and splitting the hugepage itself, instead of relaying on split_huge_page, but that sounds very reasonable, especially considering split_huge_page wouldn't currently transfer the reserved bit anyway. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-08-26KVM: PPC: reserve a capability number for multitce supportAlexey Kardashevskiy
This is to reserve a capablity number for upcoming support of H_PUT_TCE_INDIRECT and H_STUFF_TCE pseries hypercalls which support mulptiple DMA map/unmap operations per one call. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-08-26ppc: kvm: use anon_inode_getfd() with O_CLOEXEC flagYann Droneaud
KVM uses anon_inode_get() to allocate file descriptors as part of some of its ioctls. But those ioctls are lacking a flag argument allowing userspace to choose options for the newly opened file descriptor. In such case it's advised to use O_CLOEXEC by default so that userspace is allowed to choose, without race, if the file descriptor is going to be inherited across exec(). This patch set O_CLOEXEC flag on all file descriptors created with anon_inode_getfd() to not leak file descriptors across exec(). Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Link: http://lkml.kernel.org/r/cover.1377372576.git.ydroneaud@opteya.com Reviewed-by: Alexander Graf <agraf@suse.de> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-08-26kvm: use anon_inode_getfd() with O_CLOEXEC flagYann Droneaud
KVM uses anon_inode_get() to allocate file descriptors as part of some of its ioctls. But those ioctls are lacking a flag argument allowing userspace to choose options for the newly opened file descriptor. In such case it's advised to use O_CLOEXEC by default so that userspace is allowed to choose, without race, if the file descriptor is going to be inherited across exec(). This patch set O_CLOEXEC flag on all file descriptors created with anon_inode_getfd() to not leak file descriptors across exec(). Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Link: http://lkml.kernel.org/r/cover.1377372576.git.ydroneaud@opteya.com Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-08-26Documentation/kvm : Add documentation on Hypercalls and features used for PV ↵Raghavendra K T
spinlock KVM_HC_KICK_CPU hypercall added to wakeup halted vcpu in paravirtual spinlock enabled guest. KVM_FEATURE_PV_UNHALT enables guest to check whether pv spinlock can be enabled in guest. Thanks Vatsa for rewriting KVM_HC_KICK_CPU Cc: Rob Landley <rob@landley.net> Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Acked-by: Gleb Natapov <gleb@redhat.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-08-26kvm hypervisor: Simplify kvm_for_each_vcpu with kvm_irq_delivery_to_apicRaghavendra K T
Note that we are using APIC_DM_REMRD which has reserved usage. In future if APIC_DM_REMRD usage is standardized, then we should find some other way or go back to old method. Suggested-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Acked-by: Gleb Natapov <gleb@redhat.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-08-26kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocksSrivatsa Vaddagiri
kvm_hc_kick_cpu allows the calling vcpu to kick another vcpu out of halt state. the presence of these hypercalls is indicated to guest via kvm_feature_pv_unhalt. Fold pv_unhalt flag into GET_MP_STATE ioctl to aid migration During migration, any vcpu that got kicked but did not become runnable (still in halted state) should be runnable after migration. Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com> [Raghu: Apic related changes, folding pvunhalted into vcpu_runnable Added flags for future use (suggested by Gleb)] [ Raghu: fold pv_unhalt flag as suggested by Eric Northup] Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Acked-by: Gleb Natapov <gleb@redhat.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-08-26kvm uapi: Add KICK_CPU and PV_UNHALT definition to uapiRaghavendra K T
this is needed by both guest and host. Originally-from: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Acked-by: Gleb Natapov <gleb@redhat.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-08-26mips/kvm: Make kvm_locore.S 64-bit buildable/safe.David Daney
We need to use more of the Macros in asm.h to allow kvm_locore.S to build in a 64-bit kernel. For 32-bit there is no change in the generated object code. Signed-off-by: David Daney <david.daney@cavium.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-08-26mips/kvm: Cleanup .push/.pop directives in kvm_locore.SDavid Daney
There are: .set push .set noreorder .set noat . . . .set pop Sequences all over the place in this file, but in some places the final ".set pop" is erroneously converted to ".set push", so none of these really do what they appear to. Clean up the whole mess by moving ".set noreorder", ".set noat" to the top, and get rid of everything else. Generated object code is unchanged. Signed-off-by: David Daney <david.daney@cavium.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-08-26mips/kvm: Improve code formatting in arch/mips/kvm/kvm_locore.SDavid Daney
No code changes, just reflowing some comments and consistently using tabs and spaces. Object code is verified to be unchanged. Signed-off-by: David Daney <david.daney@cavium.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: James Hogan <james.hogan@imgtec.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-08-22powerpc/kvm: Copy the pvr value after memsetAneesh Kumar K.V
Otherwise we would clear the pvr value Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-08-13KVM: x86: Update symbolic exit codesJan Kiszka
Add decoding for INVEPT and reorder the list according to the reason numbers. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-08-07KVM: nVMX: Advertise IA32_PAT in VM exit controlArthur Chunqi Li
Advertise VM_EXIT_SAVE_IA32_PAT and VM_EXIT_LOAD_IA32_PAT. Signed-off-by: Arthur Chunqi Li <yzt356@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-08-07KVM: nVMX: Fix up VM_ENTRY_IA32E_MODE control feature reportingJan Kiszka
Do not report that we can enter the guest in 64-bit mode if the host is 32-bit only. This is not supported by KVM. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-08-07KVM: nEPT: Advertise WB type EPTPJan Kiszka
At least WB must be possible. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-08-07nVMX: Keep arch.pat in sync on L1-L2 switchesJan Kiszka
When asking vmx to load the PAT MSR for us while switching from L1 to L2 or vice versa, we have to update arch.pat as well as it may later be used again to load or read out the MSR content. Reviewed-by: Gleb Natapov <gleb@redhat.com> Tested-by: Arthur Chunqi Li <yzt356@gmail.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-08-07nEPT: Miscelleneous cleanupsNadav Har'El
Some trivial code cleanups not really related to nested EPT. Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Jun Nakajima <jun.nakajima@intel.com> Signed-off-by: Xinhao Xu <xinhao.xu@intel.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-08-07nEPT: Some additional commentsNadav Har'El
Some additional comments to preexisting code: Explain who (L0 or L1) handles EPT violation and misconfiguration exits. Don't mention "shadow on either EPT or shadow" as the only two options. Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Jun Nakajima <jun.nakajima@intel.com> Signed-off-by: Xinhao Xu <xinhao.xu@intel.com> Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-08-07Advertise the support of EPT to the L1 guest, through the appropriate MSR.Nadav Har'El
This is the last patch of the basic Nested EPT feature, so as to allow bisection through this patch series: The guest will not see EPT support until this last patch, and will not attempt to use the half-applied feature. Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Jun Nakajima <jun.nakajima@intel.com> Signed-off-by: Xinhao Xu <xinhao.xu@intel.com> Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-08-07nEPT: Nested INVEPTNadav Har'El
If we let L1 use EPT, we should probably also support the INVEPT instruction. In our current nested EPT implementation, when L1 changes its EPT table for L2 (i.e., EPT12), L0 modifies the shadow EPT table (EPT02), and in the course of this modification already calls INVEPT. But if last level of shadow page is unsync not all L1's changes to EPT12 are intercepted, which means roots need to be synced when L1 calls INVEPT. Global INVEPT should not be different since roots are synced by kvm_mmu_load() each time EPTP02 changes. Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Jun Nakajima <jun.nakajima@intel.com> Signed-off-by: Xinhao Xu <xinhao.xu@intel.com> Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-08-07nEPT: MMU context for nested EPTNadav Har'El
KVM's existing shadow MMU code already supports nested TDP. To use it, we need to set up a new "MMU context" for nested EPT, and create a few callbacks for it (nested_ept_*()). This context should also use the EPT versions of the page table access functions (defined in the previous patch). Then, we need to switch back and forth between this nested context and the regular MMU context when switching between L1 and L2 (when L1 runs this L2 with EPT). Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Jun Nakajima <jun.nakajima@intel.com> Signed-off-by: Xinhao Xu <xinhao.xu@intel.com> Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-08-07nEPT: Add nEPT violation/misconfigration supportYang Zhang
Inject nEPT fault to L1 guest. This patch is original from Xinhao. Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Jun Nakajima <jun.nakajima@intel.com> Signed-off-by: Xinhao Xu <xinhao.xu@intel.com> Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-08-07nEPT: correctly check if remote tlb flush is needed for shadowed EPT tablesGleb Natapov
need_remote_flush() assumes that shadow page is in PT64 format, but with addition of nested EPT this is no longer always true. Fix it by bits definitions that depend on host shadow page type. Reported-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-08-07nEPT: Redefine EPT-specific link_shadow_page()Yang Zhang
Since nEPT doesn't support A/D bit, so we should not set those bit when build shadow page table. Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-08-07nEPT: Add EPT tables support to paging_tmpl.hNadav Har'El
This is the first patch in a series which adds nested EPT support to KVM's nested VMX. Nested EPT means emulating EPT for an L1 guest so that L1 can use EPT when running a nested guest L2. When L1 uses EPT, it allows the L2 guest to set its own cr3 and take its own page faults without either of L0 or L1 getting involved. This often significanlty improves L2's performance over the previous two alternatives (shadow page tables over EPT, and shadow page tables over shadow page tables). This patch adds EPT support to paging_tmpl.h. paging_tmpl.h contains the code for reading and writing page tables. The code for 32-bit and 64-bit tables is very similar, but not identical, so paging_tmpl.h is #include'd twice in mmu.c, once with PTTTYPE=32 and once with PTTYPE=64, and this generates the two sets of similar functions. There are subtle but important differences between the format of EPT tables and that of ordinary x86 64-bit page tables, so for nested EPT we need a third set of functions to read the guest EPT table and to write the shadow EPT table. So this patch adds third PTTYPE, PTTYPE_EPT, which creates functions (prefixed with "EPT") which correctly read and write EPT tables. Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Jun Nakajima <jun.nakajima@intel.com> Signed-off-by: Xinhao Xu <xinhao.xu@intel.com> Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-08-07nEPT: Support shadow paging for guest paging without A/D bitsGleb Natapov
Some guest paging modes do not support A/D bits. Add support for such modes in shadow page code. For such modes PT_GUEST_DIRTY_MASK, PT_GUEST_ACCESSED_MASK, PT_GUEST_DIRTY_SHIFT and PT_GUEST_ACCESSED_SHIFT should be set to zero. Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-08-07nEPT: make guest's A/D bits depends on guest's paging modeGleb Natapov
This patch makes guest A/D bits definition to be dependable on paging mode, so when EPT support will be added it will be able to define them differently. Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>