Age | Commit message (Collapse) | Author |
|
This removes the powerpc "generic" updates of vcpu->cpu in load and
put, and moves them to the various backends.
The reason is that "HV" KVM does its own sauce with that field
and the generic updates might corrupt it. The field contains the
CPU# of the -first- HW CPU of the core always for all the VCPU
threads of a core (the one that's online from a host Linux
perspective).
However, the preempt notifiers are going to be called on the
threads VCPUs when they are running (due to them sleeping on our
private waitqueue) causing unload to be called, potentially
clobbering the value.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
This adds an implementation of kvm_arch_flush_shadow_memslot for
Book3S HV, and arranges for kvmppc_core_commit_memory_region to
flush the dirty log when modifying an existing slot. With this,
we can handle deletion and modification of memory slots.
kvm_arch_flush_shadow_memslot calls kvmppc_core_flush_memslot, which
on Book3S HV now traverses the reverse map chains to remove any HPT
(hashed page table) entries referring to pages in the memslot. This
gets called by generic code whenever deleting a memslot or changing
the guest physical address for a memslot.
We flush the dirty log in kvmppc_core_commit_memory_region for
consistency with what x86 does. We only need to flush when an
existing memslot is being modified, because for a new memslot the
rmap array (which stores the dirty bits) is all zero, meaning that
every page is considered clean already, and when deleting a memslot
we obviously don't care about the dirty bits any more.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
Now that we have an architecture-specific field in the kvm_memory_slot
structure, we can use it to store the array of page physical addresses
that we need for Book3S HV KVM on PPC970 processors. This reduces the
size of struct kvm_arch for Book3S HV, and also reduces the size of
struct kvm_arch_memory_slot for other PPC KVM variants since the fields
in it are now only compiled in for Book3S HV.
This necessitates making the kvm_arch_create_memslot and
kvm_arch_free_memslot operations specific to each PPC KVM variant.
That in turn means that we now don't allocate the rmap arrays on
Book3S PR and Book E.
Since we now unpin pages and free the slot_phys array in
kvmppc_core_free_memslot, we no longer need to do it in
kvmppc_core_destroy_vm, since the generic code takes care to free
all the memslots when destroying a VM.
We now need the new memslot to be passed in to
kvmppc_core_prepare_memory_region, since we need to initialize its
arch.slot_phys member on Book3S HV.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
The generic KVM code uses SRCU (sleeping RCU) to protect accesses
to the memslots data structures against updates due to userspace
adding, modifying or removing memory slots. We need to do that too,
both to avoid accessing stale copies of the memslots and to avoid
lockdep warnings. This therefore adds srcu_read_lock/unlock pairs
around code that accesses and uses memslots.
Since the real-mode handlers for H_ENTER, H_REMOVE and H_BULK_REMOVE
need to access the memslots, and we don't want to call the SRCU code
in real mode (since we have no assurance that it would only access
the linear mapping), we hold the SRCU read lock for the VM while
in the guest. This does mean that adding or removing memory slots
while some vcpus are executing in the guest will block for up to
two jiffies. This tradeoff is acceptable since adding/removing
memory slots only happens rarely, while H_ENTER/H_REMOVE/H_BULK_REMOVE
are performance-critical hot paths.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
When running on HV aware hosts, we can not trap when the guest sets the FP
bit, so we just let it do so when it wants to, because it has full access to
MSR.
For non-HV aware hosts with an FPU (like 440), we need to also adjust the
shadow MSR though. Otherwise the guest gets an FP unavailable trap even when
it really enabled the FP bit in MSR.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
We need mfdcrx to execute properly on 460 cores.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
We need mtdcrx to execute properly on 460 cores.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
Since we always mark pages as dirty immediately when mapping them read/write
now, there's no need for the dirty flag in our cache.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
Exit traces are a lot easier to read when you don't have to remember
cryptic numbers for guest exit reasons. Symbolify them in our trace
output.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
Add support for the MCSR SPR. This only implements the SPR storage
bits, not actual machine checks.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
We need to make sure that vcpu->arch.pvr is initialized to a sane value,
so let's just take the host PVR.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
IAC/DAC are defined as 32 bit while they are 64 bit wide. So ONE_REG
interface is added to set/get them.
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
This patch adds the watchdog emulation in KVM. The watchdog
emulation is enabled by KVM_ENABLE_CAP(KVM_CAP_PPC_BOOKE_WATCHDOG) ioctl.
The kernel timer are used for watchdog emulation and emulates
h/w watchdog state machine. On watchdog timer expiry, it exit to QEMU
if TCR.WRC is non ZERO. QEMU can reset/shutdown etc depending upon how
it is configured.
Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
[bharat.bhushan@freescale.com: reworked patch]
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
[agraf: adjust to new request framework]
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
Requests may want to tell us that we need to go back into host state,
so add a return value for the checks.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
Our prepare_to_enter helper wants to be able to return in more circumstances
to the host than only when an interrupt is pending. Broaden the interface a
bit and move even more generic code to the generic helper.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
We don't need to do anything when mode is EXITING_GUEST_MODE, because
we essentially are outside of guest mode and did everything it asked
us to do by the time we check it.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
We need to call kvm_guest_enter in booke and book3s, so move its
call to generic code.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
Today, we disable preemption while inside guest context, because we need
to expose to the world that we are not in a preemptible context. However,
during that time we already have interrupts disabled, which would indicate
that we are in a non-preemptible context.
The reason the checks for irqs_disabled() fail for us though is that we
manually control hard IRQs and ignore all the lazy EE framework. Let's
stop doing that. Instead, let's always use lazy EE to indicate when we
want to disable IRQs, but do a special final switch that gets us into
EE disabled, but soft enabled state. That way when we get back out of
guest state, we are immediately ready to process interrupts.
This simplifies the code drastically and reduces the time that we appear
as preempt disabled.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
When getting out of __vcpu_run, let's be consistent about the state we
return in. We want to always
* have IRQs enabled
* have called kvm_guest_exit before
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
When going out of guest mode, indicate that we are in vcpu->mode. That way
requests from other CPUs don't needlessly need to kick us to process them,
because it'll just happen next time we enter the guest.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
The x86 implementation of KVM accounts for host time while processing
guest exits. Do the same for us.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
Now that we use our generic exit helper, we can safely drop our previous
kvm_resched that we used to trigger at the beginning of the exit handler
function.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
We only need to set vcpu->mode to outside once.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
Now that we have very simple MMU Notifier support for e500 in place,
also add the same simple support to book3s. It gets us one step closer
to actual fast support.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
We need to do the same things when preparing to enter a guest for booke and
book3s_pr cores. Fold the generic code into a generic function that both call.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
We only call kvmppc_check_requests() when vcpu->requests != 0, so drop
the redundant check in the function itself
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
Without trace points, debugging what exactly is going on inside guest
code can be very tricky. Add a few more trace points at places that
hopefully tell us more when things go wrong.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
The e500 target has lived without mmu notifiers ever since it got
introduced, but fails for the user space check on them with hugetlbfs.
So in order to get that one working, implement mmu notifiers in a
reasonably dumb fashion and be happy. On embedded hardware, we almost
never end up with mmu notifier calls, since most people don't overcommit.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
Generic KVM code might want to know whether we are inside guest context
or outside. It also wants to be able to push us out of guest context.
Add support to the BookE code for the generic vcpu->mode field that describes
the above states.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
We need a central place to check for pending requests in. Add one that
only does the timer check we already do in a different place.
Later, this central function can be extended by more checks.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
This is printed once for every RMA or HPT region that get
preallocated. If one preallocates hundreds of such regions
(in order to run hundreds of KVM guests), that gets rather
painful, so make it a bit quieter.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
Our mapping code assumes that TLB0 entries are always mapped. However, after
calling clear_tlb_refs() this is no longer the case.
Map them dynamically if we find an entry unmapped in TLB0.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
We're already counting remote TLB flushes in a variable, but don't export
it to user space yet. Do so, so we know what's going on.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
Semantically, the "SYNC" cap means that we have mmu notifiers available.
Express this in our #ifdef'ery around the feature, so that we can be sure
we don't miss out on ppc targets when they get their implementation.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
We want to have tracing information on guest exits for booke as well
as book3s. Since most information is identical, use a common trace point.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
And add a new flag definition in kvm_ppc_pvinfo to indicate
whether the host supports the EV_IDLE hcall.
Signed-off-by: Liu Yu <yu.liu@freescale.com>
[stuart.yoder@freescale.com: cleanup,fixes for conditions allowing idle]
Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
[agraf: fix typo]
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
Signed-off-by: Liu Yu <yu.liu@freescale.com>
[stuart: factored this out from idle hcall support in host patch]
Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc updates from Benjamin Herrenschmidt:
"Some highlights in addition to the usual batch of fixes:
- 64TB address space support for 64-bit processes by Aneesh Kumar
- Gavin Shan did a major cleanup & re-organization of our EEH support
code (IBM fancy PCI error handling & recovery infrastructure) which
paves the way for supporting different platform backends, along
with some rework of the PCIe code for the PowerNV platform in order
to remove home made resource allocations and instead use the
generic code (which is possible after some small improvements to it
done by Gavin).
- Uprobes support by Ananth N Mavinakayanahalli
- A pile of embedded updates from Freescale folks, including new SoC
and board supports, more KVM stuff including preparing for 64-bit
BookE KVM support, ePAPR 1.1 updates, etc..."
Fixup trivial conflicts in drivers/scsi/ipr.c
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (146 commits)
powerpc/iommu: Fix multiple issues with IOMMU pools code
powerpc: Fix VMX fix for memcpy case
driver/mtd:IFC NAND:Initialise internal SRAM before any write
powerpc/fsl-pci: use 'Header Type' to identify PCIE mode
powerpc/eeh: Don't release eeh_mutex in eeh_phb_pe_get
powerpc: Remove tlb batching hack for nighthawk
powerpc: Set paca->data_offset = 0 for boot cpu
powerpc/perf: Sample only if SIAR-Valid bit is set in P7+
powerpc/fsl-pci: fix warning when CONFIG_SWIOTLB is disabled
powerpc/mpc85xx: Update interrupt handling for IFC controller
powerpc/85xx: Enable USB support in p1023rds_defconfig
powerpc/smp: Do not disable IPI interrupts during suspend
powerpc/eeh: Fix crash on converting OF node to edev
powerpc/eeh: Lock module while handling EEH event
powerpc/kprobe: Don't emulate store when kprobe stwu r1
powerpc/kprobe: Complete kprobe and migrate exception frame
powerpc/kprobe: Introduce a new thread flag
powerpc: Remove unused __get_user64() and __put_user64()
powerpc/eeh: Global mutex to protect PE tree
powerpc/eeh: Remove EEH PE for normal PCI hotplug
...
|
|
This patch convert different functions to take virtual page number
instead of virtual address. Virtual page number is virtual address
shifted right by VPN_SHIFT (12) bits. This enable us to have an
address range of upto 76 bits.
Reviewed-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Brings in various bug fixes from 3.6-rcX
|
|
Critical exception on 64-bit booke uses user-visible SPRG3 as scratch.
Restore VDSO information in SPRG3 on exception prolog.
Use a common sprg3 field in PACA for all powerpc64 architectures.
Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Introducing kvm_arch_flush_shadow_memslot, to invalidate the
translations of a single memory slot.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
The build error was caused by that builtin functions are calling
the functions implemented in modules. This error was introduced by
commit 4d8b81abc4 ("KVM: introduce readonly memslot").
The patch fixes the build error by moving function __gfn_to_hva_memslot()
from kvm_main.c to kvm_host.h and making that "inline" so that the
builtin function (kvmppc_h_enter) can use that.
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
Merging critical fixes from upstream required for development.
* upstream/master: (809 commits)
libata: Add a space to " 2GB ATA Flash Disk" DMA blacklist entry
Revert "powerpc: Update g5_defconfig"
powerpc/perf: Use pmc_overflow() to detect rolled back events
powerpc: Fix VMX in interrupt check in POWER7 copy loops
powerpc: POWER7 copy_to_user/copy_from_user patch applied twice
powerpc: Fix personality handling in ppc64_personality()
powerpc/dma-iommu: Fix IOMMU window check
powerpc: Remove unnecessary ifdefs
powerpc/kgdb: Restore current_thread_info properly
powerpc/kgdb: Bail out of KGDB when we've been triggered
powerpc/kgdb: Do not set kgdb_single_step on ppc
powerpc/mpic_msgr: Add missing includes
powerpc: Fix null pointer deref in perf hardware breakpoints
powerpc: Fixup whitespace in xmon
powerpc: Fix xmon dl command for new printk implementation
xfs: check for possible overflow in xfs_ioc_trim
xfs: unlock the AGI buffer when looping in xfs_dialloc
xfs: fix uninitialised variable in xfs_rtbuf_get()
powerpc/fsl: fix "Failed to mount /dev: No such device" errors
powerpc/fsl: update defconfigs
...
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
Put the parameters the right way around
Addresses https://bugzilla.kernel.org/show_bug.cgi?id=44031
Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
When we map a page that wasn't icache cleared before, do so when first
mapping it in KVM using the same information bits as the Linux mapping
logic. That way we are 100% sure that any page we map does not have stale
entries in the icache.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
In handling the H_CEDE hypercall, if this vcpu has already been
prodded (with the H_PROD hypercall, which Linux guests don't in fact
use), we branch to a numeric label '1f'. Unfortunately there is
another '1:' label before the one that we want to jump to. This fixes
the problem by using a textual label, 'kvm_cede_prodded'. It also
changes the label for another longish branch from '2:' to
'kvm_cede_exit' to avoid a possible future problem if code modifications
add another numeric '2:' label in between.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
After commit a2766325cf9f9, the error page is replaced by the
error code, it need not be released anymore
[ The patch has been compiling tested for powerpc ]
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
After commit a2766325cf9f9, the error pfn is replaced by the
error code, it need not be released anymore
[ The patch has been compiling tested for powerpc ]
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|