summaryrefslogtreecommitdiff
path: root/drivers/kvm/mmu.c
AgeCommit message (Collapse)Author
2007-05-03KVM: fix an if() conditionAdrian Bunk
It might have worked in this case since PT_PRESENT_MASK is 1, but let's express this correctly. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03KVM: Per-vcpu statisticsAvi Kivity
Make the exit statistics per-vcpu instead of global. This gives a 3.5% boost when running one virtual machine per core on my two socket dual core (4 cores total) machine. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03KVM: MMU: Avoid heavy ASSERT at non debug mode.Yaozu Dong
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03KVM: Retry sleeping allocation if atomic allocation failsAvi Kivity
This avoids -ENOMEM under memory pressure. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03KVM: Use slab caches to allocate mmu data structuresAvi Kivity
Better leak detection, statistics, memory use, speed -- goodness all around. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03KVM: Handle partial pae pdptrAvi Kivity
Some guests (Solaris) do not set up all four pdptrs, but leave some invalid. kvm incorrectly treated these as valid page directories, pinning the wrong pages and causing general confusion. Fix by checking the valid bit of a pae pdpte. This closes sourceforge bug 1698922. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03KVM: Simply gfn_to_page()Avi Kivity
Mapping a guest page to a host page is a common operation. Currently, one has first to find the memory slot where the page belongs (gfn_to_memslot), then locate the page itself (gfn_to_page()). This is clumsy, and also won't work well with memory aliases. So simplify gfn_to_page() not to require memory slot translation first, and instead do it internally. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03KVM: Add mmu cache clear functionDor Laor
Functions that play around with the physical memory map need a way to clear mappings to possibly nonexistent or invalid memory. Both the mmu cache and the processor tlb are cleared. Signed-off-by: Dor Laor <dor.laor@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03KVM: Use list_move()Avi Kivity
Use list_move() where possible. Noticed by Dor Laor. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03KVM: MMU: Fix hugepage pdes mapping same physical address with different accessAvi Kivity
The kvm mmu keeps a shadow page for hugepage pdes; if several such pdes map the same physical address, they share the same shadow page. This is a fairly common case (kernel mappings on i386 nonpae Linux, for example). However, if the two pdes map the same memory but with different permissions, kvm will happily use the cached shadow page. If the access through the more permissive pde will occur after the access to the strict pde, an endless pagefault loop will be generated and the guest will make no progress. Fix by making the access permissions part of the cache lookup key. The fix allows Xen pae to boot on kvm and run guest domains. Thanks to Jeremy Fitzhardinge for reporting the bug and testing the fix. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03KVM: MMU: Remove global pte trackingAvi Kivity
The initial, noncaching, version of the kvm mmu flushed the all nonglobal shadow page table translations (much like a native tlb flush). The new implementation flushes translations only when they change, rendering global pte tracking superfluous. This removes the unused tracking mechanism and storage space. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03KVM: Avoid guest virtual addresses in string pio userspace interfaceAvi Kivity
The current string pio interface communicates using guest virtual addresses, relying on userspace to translate addresses and to check permissions. This interface cannot fully support guest smp, as the check needs to take into account two pages at one in case an unaligned string transfer straddles a page boundary. Change the interface not to communicate guest addresses at all; instead use a buffer page (mmaped by userspace) and do transfers there. The kernel manages the virtual to physical translation and can perform the checks atomically by taking the appropriate locks. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03KVM: Fix bogus sign extension in mmu mapping auditAvi Kivity
When auditing a 32-bit guest on a 64-bit host, sign extension of the page table directory pointer table index caused bogus addresses to be shown on audit errors. Fix by declaring the index unsigned. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-04-19KVM: Fix off-by-one when writing to a nonpae guest pdeAvi Kivity
Nonpae guest pdes are shadowed by two pae ptes, so we double the offset twice: once to account for the pte size difference, and once because we need to shadow pdes for a single guest pde. But when writing to the upper guest pde we also need to truncate the lower bits, otherwise the multiply shifts these bits into the pde index and causes an access to the wrong shadow pde. If we're at the end of the page (accessing the very last guest pde) we can even overflow into the next host page and oops. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-03-18KVM: MMU: Fix host memory corruption on i386 with >= 4GB ramAvi Kivity
PAGE_MASK is an unsigned long, so using it to mask physical addresses on i386 (which are 64-bit wide) leads to truncation. This can result in page->private of unrelated memory pages being modified, with disasterous results. Fix by not using PAGE_MASK for physical addresses; instead calculate the correct value directly from PAGE_SIZE. Also fix a similar BUG_ON(). Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-03-18KVM: MMU: Fix guest writes to nonpae pdeAvi Kivity
KVM shadow page tables are always in pae mode, regardless of the guest setting. This means that a guest pde (mapping 4MB of memory) is mapped to two shadow pdes (mapping 2MB each). When the guest writes to a pte or pde, we intercept the write and emulate it. We also remove any shadowed mappings corresponding to the write. Since the mmu did not account for the doubling in the number of pdes, it removed the wrong entry, resulting in a mismatch between shadow page tables and guest page tables, followed shortly by guest memory corruption. This patch fixes the problem by detecting the special case of writing to a non-pae pde and adjusting the address and number of shadow pdes zapped accordingly. Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-03-04KVM: Use page_private()/set_page_private() apisMarkus Rechberger
Besides using an established api, this allows using kvm in older kernels. Signed-off-by: Markus Rechberger <markus.rechberger@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-02-09[PATCH] misc NULL noise removalAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-26[PATCH] KVM: MMU: Report nx faults to the guestAvi Kivity
With the recent guest page fault change, we perform access checks on our own instead of relying on the cpu. This means we have to perform the nx checks as well. Software like the google toolbar on windows appears to rely on this somehow. Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-26[PATCH] KVM: MMU: Perform access checks in walk_addr()Avi Kivity
Check pte permission bits in walk_addr(), instead of scattering the checks all over the code. This has the following benefits: 1. We no longer set the accessed bit for accessed which fail permission checks. 2. Setting the accessed bit is simplified. 3. Under some circumstances, we used to pretend a page fault was fixed when it would actually fail the access checks. This caused an unnecessary vmexit. 4. The error code for guest page faults is now correct. The fix helps netbsd further along booting, and allows kvm to pass the new mmu testsuite. Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-06[PATCH] KVM: Simplify mmu_alloc_roots()Ingo Molnar
Small optimization/cleanup: page == page_header(page->page_hpa) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06[PATCH] KVM: Avoid oom on cr3 switchIngo Molnar
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06[PATCH] KVM: MMU: add audit code to check mappings, etc are correctAvi Kivity
Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06[PATCH] KVM: MMU: Flush guest tlb when reducing permissions on a pteAvi Kivity
If we reduce permissions on a pte, we must flush the cached copy of the pte from the guest's tlb. This is implemented at the moment by flushing the entire guest tlb, and can be improved by flushing just the relevant virtual address, if it is known. Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06[PATCH] KVM: MMU: Detect oom conditions and propagate error to userspaceAvi Kivity
Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06[PATCH] KVM: MMU: Replace atomic allocations by preallocated objectsAvi Kivity
The mmu sometimes needs memory for reverse mapping and parent pte chains. however, we can't allocate from within the mmu because of the atomic context. So, move the allocations to a central place that can be executed before the main mmu machinery, where we can bail out on failure before any damage is done. (error handling is deffered for now, but the basic structure is there) Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06[PATCH] KVM: MMU: Free pages on kvm destructionAvi Kivity
Because mmu pages have attached rmap and parent pte chain structures, we need to zap them before freeing so the attached structures are freed. Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06[PATCH] KVM: MMU: Never free a shadow page actively serving as a rootAvi Kivity
We always need cr3 to point to something valid, so if we detect that we're freeing a root page, simply push it back to the top of the active list. Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06[PATCH] KVM: MMU: Page table write flood protectionAvi Kivity
In fork() (or when we protect a page that is no longer a page table), we can experience floods of writes to a page, which have to be emulated. This is expensive. So, if we detect such a flood, zap the page so subsequent writes can proceed natively. Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06[PATCH] KVM: MMU: If an empty shadow page is not empty, report more infoAvi Kivity
Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06[PATCH] KVM: MMU: Ensure freed shadow pages are cleanAvi Kivity
Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06[PATCH] KVM: MMU: <ove is_empty_shadow_page() above kvm_mmu_free_page()Avi Kivity
Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06[PATCH] KVM: MMU: Handle misaligned accesses to write protected guest page ↵Avi Kivity
tables A misaligned access affects two shadow ptes instead of just one. Since a misaligned access is unlikely to occur on a real page table, just zap the page out of existence, avoiding further trouble. Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06[PATCH] KVM: MMU: Remove release_pt_page_64()Avi Kivity
Unused. Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06[PATCH] KVM: MMU: Remove invlpg interceptionAvi Kivity
Since we write protect shadowed guest page tables, there is no need to trap page invalidations (the guest will always change the mapping before issuing the invlpg instruction). Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06[PATCH] KVM: MMU: oom handlingAvi Kivity
When beginning to process a page fault, make sure we have enough shadow pages available to service the fault. If not, free some pages. Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06[PATCH] KVM: MMU: kvm_mmu_put_page() only removes one link to the pageAvi Kivity
... and so must not free it unconditionally. Move the freeing to kvm_mmu_zap_page(). Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06[PATCH] KVM: MMU: Implement child shadow unlinkingAvi Kivity
When removing a page table, we must maintain the parent_pte field all child shadow page tables. Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06[PATCH] KVM: MMU: If emulating an instruction fails, try unprotecting the pageAvi Kivity
A page table may have been recycled into a regular page, and so any instruction can be executed on it. Unprotect the page and let the cpu do its thing. Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06[PATCH] KVM: MMU: Zap shadow page table entries on writes to guest page tablesAvi Kivity
Iterate over all shadow pages which correspond to a the given guest page table and remove the mappings. A subsequent page fault will reestablish the new mapping. Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06[PATCH] KVM: MMU: Support emulated writes into RAMAvi Kivity
As the mmu write protects guest page table, we emulate those writes. Since they are not mmio, there is no need to go to userspace to perform them. So, perform the writes in the kernel if possible, and notify the mmu about them so it can take the approriate action. Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06[PATCH] KVM: MMU: Let the walker extract the target page gfn from the pteAvi Kivity
This fixes a problem where set_pte_common() looked for shadowed pages based on the page directory gfn (a huge page) instead of the actual gfn being mapped. Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06[PATCH] KVM: MMU: Write protect guest pages when a shadow is created for themAvi Kivity
When we cache a guest page table into a shadow page table, we need to prevent further access to that page by the guest, as that would render the cache incoherent. Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06[PATCH] KVM: MMU: Shadow page table cachingAvi Kivity
Define a hashtable for caching shadow page tables. Look up the cache on context switch (cr3 change) or during page faults. The key to the cache is a combination of - the guest page table frame number - the number of paging levels in the guest * we can cache real mode, 32-bit mode, pae, and long mode page tables simultaneously. this is useful for smp bootup. - the guest page table table * some kernels use a page as both a page table and a page directory. this allows multiple shadow pages to exist for that page, one per level - the "quadrant" * 32-bit mode page tables span 4MB, whereas a shadow page table spans 2MB. similarly, a 32-bit page directory spans 4GB, while a shadow page directory spans 1GB. the quadrant allows caching up to 4 shadow page tables for one guest page in one level. - a "metaphysical" bit * for real mode, and for pse pages, there is no guest page table, so set the bit to avoid write protecting the page. Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06[PATCH] KVM: MMU: Make kvm_mmu_alloc_page() return a kvm_mmu_page pointerAvi Kivity
This allows further manipulation on the shadow page table. Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06[PATCH] KVM: MU: Special treatment for shadow pae root pagesAvi Kivity
Since we're not going to cache the pae-mode shadow root pages, allocate a single pae shadow that will hold the four lower-level pages, which will act as roots. Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06[PATCH] KVM: MMU: Implement simple reverse mappingAvi Kivity
Keep in each host page frame's page->private a pointer to the shadow pte which maps it. If there are multiple shadow ptes mapping the page, set bit 0 of page->private, and use the rest as a pointer to a linked list of all such mappings. Reverse mappings are needed because we when we cache shadow page tables, we must protect the guest page tables from being modified by the guest, as that would invalidate the cached ptes. Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-30[PATCH] kvm: fix GFP_KERNEL allocation in atomic section in ↵Ingo Molnar
kvm_dev_ioctl_create_vcpu() fix an GFP_KERNEL allocation in atomic section: kvm_dev_ioctl_create_vcpu() called kvm_mmu_init(), which calls alloc_pages(), while holding the vcpu. The fix is to set up the MMU state in two phases: kvm_mmu_create() and kvm_mmu_setup(). (NOTE: free_vcpus does an kvm_mmu_destroy() call so there's no need for any extra teardown branch on allocation/init failure here.) Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-30[PATCH] KVM: Simplify is_long_mode()Avi Kivity
Instead of doing tricky stuff with the arch dependent virtualization registers, take a peek at the guest's efer. This simlifies some code, and fixes some confusion in the mmu branch. Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-22[PATCH] KVM: Use more traditional error handling in kvm_mmu_init()Avi Kivity
Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>