summaryrefslogtreecommitdiff
path: root/arch/x86/kvm
AgeCommit message (Collapse)Author
2011-11-17KVM: VMX: Check for automatic switch msr table overflowGleb Natapov
Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-11-17KVM: VMX: Add support for guest/host-only profilingGleb Natapov
Support guest/host-only profiling by switch perf msrs on a guest entry if needed. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-11-17KVM: VMX: add support for switching of PERF_GLOBAL_CTRLGleb Natapov
Some cpus have special support for switching PERF_GLOBAL_CTRL msr. Add logic to detect if such support exists and works properly and extend msr switching code to use it if available. Also extend number of generic msr switching entries to 8. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-10-30Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommuLinus Torvalds
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (33 commits) iommu/core: Remove global iommu_ops and register_iommu iommu/msm: Use bus_set_iommu instead of register_iommu iommu/omap: Use bus_set_iommu instead of register_iommu iommu/vt-d: Use bus_set_iommu instead of register_iommu iommu/amd: Use bus_set_iommu instead of register_iommu iommu/core: Use bus->iommu_ops in the iommu-api iommu/core: Convert iommu_found to iommu_present iommu/core: Add bus_type parameter to iommu_domain_alloc Driver core: Add iommu_ops to bus_type iommu/core: Define iommu_ops and register_iommu only with CONFIG_IOMMU_API iommu/amd: Fix wrong shift direction iommu/omap: always provide iommu debug code iommu/core: let drivers know if an iommu fault handler isn't installed iommu/core: export iommu_set_fault_handler() iommu/omap: Fix build error with !IOMMU_SUPPORT iommu/omap: Migrate to the generic fault report mechanism iommu/core: Add fault reporting mechanism iommu/core: Use PAGE_SIZE instead of hard-coded value iommu/core: use the existing IS_ALIGNED macro iommu/msm: ->unmap() should return order of unmapped page ... Fixup trivial conflicts in drivers/iommu/Makefile: "move omap iommu to dedicated iommu folder" vs "Rename the DMAR and INTR_REMAP config options" just happened to touch lines next to each other.
2011-10-30KVM: SVM: Keep intercepting task switching with NPT enabledJan Kiszka
AMD processors apparently have a bug in the hardware task switching support when NPT is enabled. If the task switch triggers a NPF, we can get wrong EXITINTINFO along with that fault. On resume, spurious exceptions may then be injected into the guest. We were able to reproduce this bug when our guest triggered #SS and the handler were supposed to run over a separate task with not yet touched stack pages. Work around the issue by continuing to emulate task switches even in NPT mode. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-10-21Merge branches 'amd/fixes', 'debug/dma-api', 'arm/omap', 'arm/msm', 'core', ↵Joerg Roedel
'iommu/fault-reporting' and 'api/iommu-ops-per-bus' into next Conflicts: drivers/iommu/amd_iommu.c drivers/iommu/iommu.c
2011-10-21iommu/core: Convert iommu_found to iommu_presentJoerg Roedel
With per-bus iommu_ops the iommu_found function needs to work on a bus_type too. This patch adds a bus_type parameter to that function and converts all call-places. The function is also renamed to iommu_present because the function now checks if an iommu is present for a given bus and does not check for a global iommu anymore. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2011-10-05KVM: emulate lapic tsc deadline timer for guestLiu, Jinsong
This patch emulate lapic tsc deadline timer for guest: Enumerate tsc deadline timer capability by CPUID; Enable tsc deadline timer mode by lapic MMIO; Start tsc deadline timer by WRMSR; [jan: use do_div()] [avi: fix for !irqchip_in_kernel()] [marcelo: another fix for !irqchip_in_kernel()] Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-09-25KVM: Fix simultaneous NMIsAvi Kivity
If simultaneous NMIs happen, we're supposed to queue the second and next (collapsing them), but currently we sometimes collapse the second into the first. Fix by using a counter for pending NMIs instead of a bool; since the counter limit depends on whether the processor is currently in an NMI handler, which can only be checked in vcpu context (via the NMI mask), we add a new KVM_REQ_NMI to request recalculation of the counter. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-09-25KVM: x86 emulator: convert push %sreg/pop %sreg to direct decodeAvi Kivity
Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-09-25KVM: x86 emulator: switch lds/les/lss/lfs/lgs to direct decodeAvi Kivity
Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-09-25KVM: x86 emulator: streamline decode of segment registersAvi Kivity
The opcodes push %seg pop %seg l%seg, %mem, %reg (e.g. lds/les/lss/lfs/lgs) all have an segment register encoded in the instruction. To allow reuse, decode the segment number into src2 during the decode stage instead of the execution stage. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-09-25KVM: x86 emulator: simplify OpMem64 decodeAvi Kivity
Use the same technique as the other OpMem variants, and goto mem_common. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-09-25KVM: x86 emulator: switch src decode to decode_operand()Avi Kivity
Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-09-25KVM: x86 emulator: qualify OpReg inhibit_byte_regs hackAvi Kivity
OpReg decoding has a hack that inhibits byte registers for movsx and movzx instructions. It should be replaced by something better, but meanwhile, qualify that the hack is only active for the destination operand. Note these instructions only use OpReg for the destination, but better to be explicit about it. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-09-25KVM: x86 emulator: switch OpImmUByte decode to decode_imm()Avi Kivity
Similar to SrcImmUByte. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-09-25KVM: x86 emulator: free up some flag bits near src, dstAvi Kivity
Op fields are going to grow by a bit, we need two free bits. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-09-25KVM: x86 emulator: switch src2 to generic decode_operand()Avi Kivity
Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-09-25KVM: x86 emulator: expand decode flags to 64 bitsAvi Kivity
Unifiying the operands means not taking advantage of the fact that some operand types can only go into certain operands (for example, DI can only be used by the destination), so we need more bits to hold the operand type. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-09-25KVM: x86 emulator: split dst decode to a generic decode_operand()Avi Kivity
Instead of decoding each operand using its own code, use a generic function. Start with the destination operand. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-09-25KVM: x86 emulator: move memop, memopp into emulation contextAvi Kivity
Simplifies further generalization of decode. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-09-25KVM: x86 emulator: convert group 3 instructions to direct decodeAvi Kivity
Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-09-25KVM: x86: Add module parameter for lapic periodic timer limitJan Kiszka
Certain guests, specifically RTOSes, request faster periodic timers than what we allow by default. Add a module parameter to adjust the limit for non-standard setups. Also add a rate-limited warning in case the guest requested more. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-09-25KVM: Clean up and extend rate-limited outputJan Kiszka
The use of printk_ratelimit is discouraged, replace it with pr*_ratelimited or __ratelimit. While at it, convert remaining guest-triggerable printks to rate-limited variants. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-09-25KVM: x86: Avoid guest-triggerable printks in APIC modelJan Kiszka
Convert remaining printks that the guest can trigger to apic_printk. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-09-25KVM: x86: Move kvm_trace_exit into atomic vmexit sectionJan Kiszka
This avoids that events causing the vmexit are recorded before the actual exit reason. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-09-25KVM: x86 emulator: disable writeback for TESTAvi Kivity
The TEST instruction doesn't write its destination operand. This could cause problems if an MMIO register was accessed using the TEST instruction. Recently Windows XP was observed to use TEST against the APIC ICR; this can cause spurious IPIs. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-09-25KVM: x86 emulator: simplify emulate_1op_rax_rdx()Avi Kivity
emulate_1op_rax_rdx() is always called with the same parameters. Simplify by passing just the emulation context. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-09-25KVM: x86 emulator: merge the two emulate_1op_rax_rdx implementationsAvi Kivity
We have two emulate-with-extended-accumulator implementations: once which expect traps (_ex) and one which doesn't (plain). Drop the plain implementation and always use the one which expects traps; it will simply return 0 in the _ex argument and we can happily ignore it. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-09-25KVM: x86 emulator: simplify emulate_1op()Avi Kivity
emulate_1op() is always called with the same parameters. Simplify by passing just the emulation context. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-09-25KVM: x86 emulator: simplify emulate_2op_cl()Avi Kivity
emulate_2op_cl() is always called with the same parameters. Simplify by passing just the emulation context. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-09-25KVM: x86 emulator: simplify emulate_2op_cl()Avi Kivity
emulate_2op_cl() is always called with the same parameters. Simplify by passing just the emulation context. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-09-25KVM: x86 emulator: simplify emulate_2op_SrcV()Avi Kivity
emulate_2op_SrcV(), and its siblings, emulate_2op_SrcV_nobyte() and emulate_2op_SrcB(), all use the same calling conventions and all get passed exactly the same parameters. Simplify them by passing just the emulation context. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-09-25KVM: APIC: avoid instruction emulation for EOI writesKevin Tian
Instruction emulation for EOI writes can be skipped, since sane guest simply uses MOV instead of string operations. This is a nice improvement when guest doesn't support x2apic or hyper-V EOI support. a single VM bandwidth is observed with ~8% bandwidth improvement (7.4Gbps->8Gbps), by saving ~5% cycles from EOI emulation. Signed-off-by: Kevin Tian <kevin.tian@intel.com> <Based on earlier work from>: Signed-off-by: Eddie Dong <eddie.dong@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-09-25KVM: SVM: Fix TSC MSR read in nested SVMNadav Har'El
When the TSC MSR is read by an L2 guest (when L1 allowed this MSR to be read without exit), we need to return L2's notion of the TSC, not L1's. The current code incorrectly returned L1 TSC, because svm_get_msr() was also used in x86.c where this was assumed, but now that these places call the new svm_read_l1_tsc(), the MSR read can be fixed. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Tested-by: Joerg Roedel <joerg.roedel@amd.com> Acked-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-09-25KVM: nVMX: Fix nested VMX TSC emulationNadav Har'El
This patch fixes two corner cases in nested (L2) handling of TSC-related issues: 1. Somewhat suprisingly, according to the Intel spec, if L1 allows WRMSR to the TSC MSR without an exit, then this should set L1's TSC value itself - not offset by vmcs12.TSC_OFFSET (like was wrongly done in the previous code). 2. Allow L1 to disable the TSC_OFFSETING control, and then correctly ignore the vmcs12.TSC_OFFSET. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-09-25KVM: L1 TSC handlingNadav Har'El
KVM assumed in several places that reading the TSC MSR returns the value for L1. This is incorrect, because when L2 is running, the correct TSC read exit emulation is to return L2's value. We therefore add a new x86_ops function, read_l1_tsc, to use in places that specifically need to read the L1 TSC, NOT the TSC of the current level of guest. Note that one change, of one line in kvm_arch_vcpu_load, is made redundant by a different patch sent by Zachary Amsden (and not yet applied): kvm_arch_vcpu_load() should not read the guest TSC, and if it didn't, of course we didn't have to change the call of kvm_get_msr() to read_l1_tsc(). [avi: moved callback to kvm_x86_ops tsc block] Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Acked-by: Zachary Amsdem <zamsden@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-09-25KVM: MMU: Fix SMEP failure during fetchYang, Wei Y
This patch fix kvm-unit-tests hanging and incorrect PT_ACCESSED_MASK bit set in the case of SMEP fault. The code updated 'eperm' after the variable was checked. Signed-off-by: Yang, Wei <wei.y.yang@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-09-25KVM: MMU: Do not unconditionally read PDPTE from guest memoryAvi Kivity
Architecturally, PDPTEs are cached in the PDPTRs when CR3 is reloaded. On SVM, it is not possible to implement this, but on VMX this is possible and was indeed implemented until nested SVM changed this to unconditionally read PDPTEs dynamically. This has noticable impact when running PAE guests. Fix by changing the MMU to read PDPTRs from the cache, falling back to reading from memory for the nested MMU. Signed-off-by: Avi Kivity <avi@redhat.com> Tested-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-09-25KVM: VMX: trivial: use BUG_ONJulia Lawall
Use BUG_ON(x) rather than if(x) BUG(); The semantic patch that fixes this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ identifier x; @@ -if (x) BUG(); +BUG_ON(x); @@ identifier x; @@ -if (!x) BUG(); +BUG_ON(!x); // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-09-25KVM: x86: report valid microcode update IDMarcelo Tosatti
Windows Server 2008 SP2 checked build with smp > 1 BSOD's during boot due to lack of microcode update: *** Assertion failed: The system BIOS on this machine does not properly support the processor. The system BIOS did not load any microcode update. A BIOS containing the latest microcode update is needed for system reliability. (CurrentUpdateRevision != 0) *** Source File: d:\longhorn\base\hals\update\intelupd\update.c, line 440 Report a non-zero microcode update signature to make it happy. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-09-25KVM: x86 emulator: Make x86_decode_insn() return proper macrosTakuya Yoshikawa
Return EMULATION_OK/FAILED consistently. Also treat instruction fetch errors, not restricted to X86EMUL_UNHANDLEABLE, as EMULATION_FAILED; although this cannot happen in practice, the current logic will continue the emulation even if the decoder fails to fetch the instruction. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-09-25KVM: x86 emulator: Let compiler know insn_fetch() rarely failsTakuya Yoshikawa
Fetching the instruction which was to be executed by the guest cannot fail normally. So compiler should always predict that it will succeed. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-09-25KVM: x86 emulator: Drop _size argument from insn_fetch()Takuya Yoshikawa
_type is enough to know the size. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-09-25KVM: x86 emulator: Use ctxt->_eip directly in do_insn_fetch_byte()Takuya Yoshikawa
Instead of passing ctxt->_eip from insn_fetch() call sites, get it from ctxt in do_insn_fetch_byte(). This is done by replacing the argument _eip of insn_fetch() with _ctxt, which should be better than letting the macro use ctxt silently in its body. Though this changes the place where ctxt->_eip is incremented from insn_fetch() to do_insn_fetch_byte(), this does not have any real effect. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-09-25KVM: Intelligent device lookup on I/O busSasha Levin
Currently the method of dealing with an IO operation on a bus (PIO/MMIO) is to call the read or write callback for each device registered on the bus until we find a device which handles it. Since the number of devices on a bus can be significant due to ioeventfds and coalesced MMIO zones, this leads to a lot of overhead on each IO operation. Instead of registering devices, we now register ranges which points to a device. Lookup is done using an efficient bsearch instead of a linear search. Performance test was conducted by comparing exit count per second with 200 ioeventfds created on one byte and the guest is trying to access a different byte continuously (triggering usermode exits). Before the patch the guest has achieved 259k exits per second, after the patch the guest does 274k exits per second. Cc: Avi Kivity <avi@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-09-25KVM: Use __print_symbolic() for vmexit tracepointsStefan Hajnoczi
The vmexit tracepoints format the exit_reason to make it human-readable. Since the exit_reason depends on the instruction set (vmx or svm), formatting is handled with ftrace_print_symbols_seq() by referring to the appropriate exit reason table. However, the ftrace_print_symbols_seq() function is not meant to be used directly in tracepoints since it does not export the formatting table which userspace tools like trace-cmd and perf use to format traces. In practice perf dies when formatting vmexit-related events and trace-cmd falls back to printing the numeric value (with extra formatting code in the kvm plugin to paper over this limitation). Other userspace consumers of vmexit-related tracepoints would be in similar trouble. To avoid significant changes to the kvm_exit tracepoint, this patch moves the vmx and svm exit reason tables into arch/x86/kvm/trace.h and selects the right table with __print_symbolic() depending on the instruction set. Note that __print_symbolic() is designed for exporting the formatting table to userspace and allows trace-cmd and perf to work. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-09-25KVM: Record instruction set in all vmexit tracepointsStefan Hajnoczi
The kvm_exit tracepoint recently added the isa argument to aid decoding exit_reason. The semantics of exit_reason depend on the instruction set (vmx or svm) and the isa argument allows traces to be analyzed on other machines. Add the isa argument to kvm_nested_vmexit and kvm_nested_vmexit_inject so these tracepoints can also be self-describing. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-09-25KVM: Really fix HV_X64_MSR_APIC_ASSIST_PAGEMike Waychison
Commit 0945d4b228 tried to fix the get_msr path for the HV_X64_MSR_APIC_ASSIST_PAGE msr, but was poorly tested. We should be returning 0 if the read succeeded, and passing the value back to the caller via the pdata out argument, not returning the value directly. Signed-off-by: Mike Waychison <mikew@google.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-09-25KVM: x86: get_msr support for HV_X64_MSR_APIC_ASSIST_PAGEMike Waychison
"get" support for the HV_X64_MSR_APIC_ASSIST_PAGE msr was missing, even though it is explicitly enumerated as something the vmm should save in msrs_to_save and reported to userland via the KVM_GET_MSR_INDEX_LIST ioctl. Add "get" support for HV_X64_MSR_APIC_ASSIST_PAGE. We simply return the guest visible value of this register, which seems to be correct as a set on the register is validated for us already. Signed-off-by: Mike Waychison <mikew@google.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>