summaryrefslogtreecommitdiff
path: root/arch/x86/include/asm/paravirt.h
AgeCommit message (Collapse)Author
2013-04-30Merge branch 'x86-paravirt-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 paravirt update from Ingo Molnar: "Various paravirtualization related changes - the biggest one makes guest support optional via CONFIG_HYPERVISOR_GUEST" * 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, wakeup, sleep: Use pvops functions for changing GDT entries x86, xen, gdt: Remove the pvops variant of store_gdt. x86-32, gdt: Store/load GDT for ACPI S3 or hibernation/resume path is not needed x86-64, gdt: Store/load GDT for ACPI S3 or hibernate/resume path is not needed. x86: Make Linux guest support optional x86, Kconfig: Move PARAVIRT_DEBUG into the paravirt menu
2013-04-11x86, xen, gdt: Remove the pvops variant of store_gdt.Konrad Rzeszutek Wilk
The two use-cases where we needed to store the GDT were during ACPI S3 suspend and resume. As the patches: x86/gdt/i386: store/load GDT for ACPI S3 or hibernation/resume path is not needed x86/gdt/64-bit: store/load GDT for ACPI S3 or hibernate/resume path is not needed. have demonstrated - there are other mechanism by which the GDT is saved and reloaded during early resume path. Hence we do not need to worry about the pvops call-chain for saving the GDT and can and can eliminate it. The other areas where the store_gdt is used are never going to be hit when running under the pvops platforms. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Link: http://lkml.kernel.org/r/1365194544-14648-4-git-send-email-konrad.wilk@oracle.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-04-10x86, mm: Patch out arch_flush_lazy_mmu_mode() when running on bare metalBoris Ostrovsky
Invoking arch_flush_lazy_mmu_mode() results in calls to preempt_enable()/disable() which may have performance impact. Since lazy MMU is not used on bare metal we can patch away arch_flush_lazy_mmu_mode() so that it is never called in such environment. [ hpa: the previous patch "Fix vmalloc_fault oops during lazy MMU updates" may cause a minor performance regression on bare metal. This patch resolves that performance regression. It is somewhat unclear to me if this is a good -stable candidate. ] Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: http://lkml.kernel.org/r/1364045796-10720-2-git-send-email-konrad.wilk@oracle.com Tested-by: Josh Boyer <jwboyer@redhat.com> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: <stable@vger.kernel.org> SEE NOTE ABOVE
2012-12-18x86, paravirt: fix build error when thp is disabledDavid Rientjes
With CONFIG_PARAVIRT=y and CONFIG_TRANSPARENT_HUGEPAGE=n, the build breaks because set_pmd_at() is undeclared: mm/memory.c: In function 'do_pmd_numa_page': mm/memory.c:3520: error: implicit declaration of function 'set_pmd_at' mm/mprotect.c: In function 'change_pmd_protnuma': mm/mprotect.c:120: error: implicit declaration of function 'set_pmd_at' This is because paravirt defines set_pmd_at() only when CONFIG_TRANSPARENT_HUGEPAGE=y and such a restriction is unneeded. The fix is to define it for all CONFIG_PARAVIRT configurations. Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-26Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/mm changes from Peter Anvin: "The big change here is the patchset by Alex Shi to use INVLPG to flush only the affected pages when we only need to flush a small page range. It also removes the special INVALIDATE_TLB_VECTOR interrupts (32 vectors!) and replace it with an ordinary IPI function call." Fix up trivial conflicts in arch/x86/include/asm/apic.h (added code next to changed line) * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/tlb: Fix build warning and crash when building for !SMP x86/tlb: do flush_tlb_kernel_range by 'invlpg' x86/tlb: replace INVALIDATE_TLB_VECTOR by CALL_FUNCTION_VECTOR x86/tlb: enable tlb flush range support for x86 mm/mmu_gather: enable tlb flush range in generic mmu_gather x86/tlb: add tlb_flushall_shift knob into debugfs x86/tlb: add tlb_flushall_shift for specific CPU x86/tlb: fall back to flush all when meet a THP large page x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range x86/tlb_info: get last level TLB entry number of CPU x86: Add read_mostly declaration/definition to variables from smp.h x86: Define early read-mostly per-cpu macros
2012-07-05Merge branch 'x86/cpu' into perf/coreIngo Molnar
Merge this branch because we changed the wrmsr*_safe() API and there's a conflict. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-28x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_rangeAlex Shi
x86 has no flush_tlb_range support in instruction level. Currently the flush_tlb_range just implemented by flushing all page table. That is not the best solution for all scenarios. In fact, if we just use 'invlpg' to flush few lines from TLB, we can get the performance gain from later remain TLB lines accessing. But the 'invlpg' instruction costs much of time. Its execution time can compete with cr3 rewriting, and even a bit more on SNB CPU. So, on a 512 4KB TLB entries CPU, the balance points is at: (512 - X) * 100ns(assumed TLB refill cost) = X(TLB flush entries) * 100ns(assumed invlpg cost) Here, X is 256, that is 1/2 of 512 entries. But with the mysterious CPU pre-fetcher and page miss handler Unit, the assumed TLB refill cost is far lower then 100ns in sequential access. And 2 HT siblings in one core makes the memory access more faster if they are accessing the same memory. So, in the patch, I just do the change when the target entries is less than 1/16 of whole active tlb entries. Actually, I have no data support for the percentage '1/16', so any suggestions are welcomed. As to hugetlb, guess due to smaller page table, and smaller active TLB entries, I didn't see benefit via my benchmark, so no optimizing now. My micro benchmark show in ideal scenarios, the performance improves 70 percent in reading. And in worst scenario, the reading/writing performance is similar with unpatched 3.4-rc4 kernel. Here is the reading data on my 2P * 4cores *HT NHM EP machine, with THP 'always': multi thread testing, '-t' paramter is thread number: with patch unpatched 3.4-rc4 ./mprotect -t 1 14ns 24ns ./mprotect -t 2 13ns 22ns ./mprotect -t 4 12ns 19ns ./mprotect -t 8 14ns 16ns ./mprotect -t 16 28ns 26ns ./mprotect -t 32 54ns 51ns ./mprotect -t 128 200ns 199ns Single process with sequencial flushing and memory accessing: with patch unpatched 3.4-rc4 ./mprotect 7ns 11ns ./mprotect -p 4096 -l 8 -n 10240 21ns 21ns [ hpa: http://lkml.kernel.org/r/1B4B44D9196EFF41AE41FDA404FC0A100BFF94@SHSMSX101.ccr.corp.intel.com has additional performance numbers. ] Signed-off-by: Alex Shi <alex.shi@intel.com> Link: http://lkml.kernel.org/r/1340845344-27557-3-git-send-email-alex.shi@intel.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-06-07x86, pvops: Remove hooks for {rd,wr}msr_safe_regsAndre Przywara
There were paravirt_ops hooks for the full register set variant of {rd,wr}msr_safe which are actually not used by anyone anymore. Remove them to make the code cleaner and avoid silent breakages when the pvops members were uninitialized. This has been boot-tested natively and under Xen with PVOPS enabled and disabled on one machine. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Link: http://lkml.kernel.org/r/1338562358-28182-2-git-send-email-bp@amd64.org Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-06-06x86: Add rdpmcl()Andi Kleen
Add a version of rdpmc() that directly reads into a u64 Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1338944211-28275-4-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-04-19x86, paravirt: Replace GET_CR2_INTO_RCX with GET_CR2_INTO_RAXH. Peter Anvin
GET_CR2_INTO_RCX is asinine: it is only used in one place, the actual paravirt call returns the value in %rax, not %rcx; and the one place that wants it wants the result in %r9. We actually generate as a result of this call: call ... movq %rax, %rcx xorq %rax, %rax /* this value isn't even used... */ movq %rcx, %r9 At least make the macro do what the paravirt call does, which is put the value into %rax. Nevermind the fact that the macro clobbers all the volatile registers. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Link: http://lkml.kernel.org/r/1334794610-5546-4-git-send-email-hpa@zytor.com Cc: Glauber de Oliveira Costa <glommer@parallels.com>
2012-03-24Merge tag 'bug-for-3.4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux Pull <linux/bug.h> cleanup from Paul Gortmaker: "The changes shown here are to unify linux's BUG support under the one <linux/bug.h> file. Due to historical reasons, we have some BUG code in bug.h and some in kernel.h -- i.e. the support for BUILD_BUG in linux/kernel.h predates the addition of linux/bug.h, but old code in kernel.h wasn't moved to bug.h at that time. As a band-aid, kernel.h was including <asm/bug.h> to pseudo link them. This has caused confusion[1] and general yuck/WTF[2] reactions. Here is an example that violates the principle of least surprise: CC lib/string.o lib/string.c: In function 'strlcat': lib/string.c:225:2: error: implicit declaration of function 'BUILD_BUG_ON' make[2]: *** [lib/string.o] Error 1 $ $ grep linux/bug.h lib/string.c #include <linux/bug.h> $ We've included <linux/bug.h> for the BUG infrastructure and yet we still get a compile fail! [We've not kernel.h for BUILD_BUG_ON.] Ugh - very confusing for someone who is new to kernel development. With the above in mind, the goals of this changeset are: 1) find and fix any include/*.h files that were relying on the implicit presence of BUG code. 2) find and fix any C files that were consuming kernel.h and hence relying on implicitly getting some/all BUG code. 3) Move the BUG related code living in kernel.h to <linux/bug.h> 4) remove the asm/bug.h from kernel.h to finally break the chain. During development, the order was more like 3-4, build-test, 1-2. But to ensure that git history for bisect doesn't get needless build failures introduced, the commits have been reorderd to fix the problem areas in advance. [1] https://lkml.org/lkml/2012/1/3/90 [2] https://lkml.org/lkml/2012/1/17/414" Fix up conflicts (new radeon file, reiserfs header cleanups) as per Paul and linux-next. * tag 'bug-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: kernel.h: doesn't explicitly use bug.h, so don't include it. bug: consolidate BUILD_BUG_ON with other bug code BUG: headers with BUG/BUG_ON etc. need linux/bug.h bug.h: add include of it to various implicit C users lib: fix implicit users of kernel.h for TAINT_WARN spinlock: macroize assert_spin_locked to avoid bug.h dependency x86: relocate get/set debugreg fcns to include/asm/debugreg.
2012-03-04BUG: headers with BUG/BUG_ON etc. need linux/bug.hPaul Gortmaker
If a header file is making use of BUG, BUG_ON, BUILD_BUG_ON, or any other BUG variant in a static inline (i.e. not in a #define) then that header really should be including <linux/bug.h> and not just expecting it to be implicitly present. We can make this change risk-free, since if the files using these headers didn't have exposure to linux/bug.h already, they would have been causing compile failures/warnings. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-02-24static keys: Introduce 'struct static_key', static_key_true()/false() and ↵Ingo Molnar
static_key_slow_[inc|dec]() So here's a boot tested patch on top of Jason's series that does all the cleanups I talked about and turns jump labels into a more intuitive to use facility. It should also address the various misconceptions and confusions that surround jump labels. Typical usage scenarios: #include <linux/static_key.h> struct static_key key = STATIC_KEY_INIT_TRUE; if (static_key_false(&key)) do unlikely code else do likely code Or: if (static_key_true(&key)) do likely code else do unlikely code The static key is modified via: static_key_slow_inc(&key); ... static_key_slow_dec(&key); The 'slow' prefix makes it abundantly clear that this is an expensive operation. I've updated all in-kernel code to use this everywhere. Note that I (intentionally) have not pushed through the rename blindly through to the lowest levels: the actual jump-label patching arch facility should be named like that, so we want to decouple jump labels from the static-key facility a bit. On non-jump-label enabled architectures static keys default to likely()/unlikely() branches. Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Jason Baron <jbaron@redhat.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: a.p.zijlstra@chello.nl Cc: mathieu.desnoyers@efficios.com Cc: davem@davemloft.net Cc: ddaney.cavm@gmail.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20120222085809.GA26397@elte.hu Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-14KVM guest: Add a pv_ops stub for steal timeGlauber Costa
This patch adds a function pointer in one of the many paravirt_ops structs, to allow guests to register a steal time function. Besides a steal time function, we also declare two jump_labels. They will be used to allow the steal time code to be easily bypassed when not in use. Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Tested-by: Eric B Munson <emunson@mgebm.net> CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> CC: Peter Zijlstra <peterz@infradead.org> CC: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-26thp: fix PARAVIRT x86 32bit noPAEAndrea Arcangeli
This fixes TRANSPARENT_HUGEPAGE=y with PARAVIRT=y and HIGHMEM64=n. The #ifdef that this patch removes was erratically introduced to fix a build error for noPAE (where pmd.pmd doesn't exist). So then the kernel built but it failed at runtime because set_pmd_at was a noop. This will correct it by enabling set_pmd_at for noPAE mode too. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: werner <w.landgraf@ru.ru> Reported-by: Minchan Kim <minchan.kim@gmail.com> Tested-by: Minchan Kim <minchan.kim@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-14thp: add pmd paravirt opsAndrea Arcangeli
Paravirt ops pmd_update/pmd_update_defer/pmd_set_at. Not all might be necessary (vmware needs pmd_update, Xen needs set_pmd_at, nobody needs pmd_update_defer), but this is to keep full simmetry with pte paravirt ops, which looks cleaner and simpler from a common code POV. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-12-27x86, paravirt: Use native_halt on a halt, not native_safe_haltCliff Wickman
halt() should use native_halt() safe_halt() uses native_safe_halt() If CONFIG_PARAVIRT=y, halt() is defined in arch/x86/include/asm/paravirt.h as static inline void halt(void) { PVOP_VCALL0(pv_irq_ops.safe_halt); } Otherwise (no CONFIG_PARAVIRT) halt() in arch/x86/include/asm/irqflags.h is static inline void halt(void) { native_halt(); } So it looks to me like the CONFIG_PARAVIRT case of using native_safe_halt() for a halt() is an oversight. Am I missing something? It probably hasn't shown up as a problem because the local apic is disabled on a shutdown or restart. But if we disable interrupts and call halt() we shouldn't expect that the halt() will re-enable interrupts. Signed-off-by: Cliff Wickman <cpw@sgi.com> LKML-Reference: <E1PSBcz-0001g1-FM@eag09.americas.sgi.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-11-11tracing: Force arch_local_irq_* notrace for paravirtSteven Rostedt
When running ktest.pl randconfig tests, I would sometimes trigger a lockdep annotation bug (possible reason: unannotated irqs-on). This triggering happened right after function tracer self test was executed. After doing a config bisect I found that this was caused with having function tracer, paravirt guest, prove locking, and rcu torture all enabled. The rcu torture just enhanced the likelyhood of triggering the bug. Prove locking was needed, since it was the thing that was bugging. Function tracer would trace and disable interrupts in all sorts of funny places. paravirt guest would turn arch_local_irq_* into functions that would be traced. Besides the fact that tracing arch_local_irq_* is just a bad idea, this is what is happening. The bug happened simply in the local_irq_restore() code: if (raw_irqs_disabled_flags(flags)) { \ raw_local_irq_restore(flags); \ trace_hardirqs_off(); \ } else { \ trace_hardirqs_on(); \ raw_local_irq_restore(flags); \ } \ The raw_local_irq_restore() was defined as arch_local_irq_restore(). Now imagine, we are about to enable interrupts. We go into the else case and call trace_hardirqs_on() which tells lockdep that we are enabling interrupts, so it sets the current->hardirqs_enabled = 1. Then we call raw_local_irq_restore() which calls arch_local_irq_restore() which gets traced! Now in the function tracer we disable interrupts with local_irq_save(). This is fine, but flags is stored that we have interrupts disabled. When the function tracer calls local_irq_restore() it does it, but this time with flags set as disabled, so we go into the if () path. This keeps interrupts disabled and calls trace_hardirqs_off() which sets current->hardirqs_enabled = 0. When the tracer is finished and proceeds with the original code, we enable interrupts but leave current->hardirqs_enabled as 0. Which now breaks lockdeps internal processing. Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-10-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-irqflagsLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-irqflags: Fix IRQ flag handling naming MIPS: Add missing #inclusions of <linux/irq.h> smc91x: Add missing #inclusion of <linux/irq.h> Drop a couple of unnecessary asm/system.h inclusions SH: Add missing consts to sys_execve() declaration Blackfin: Rename IRQ flags handling functions Blackfin: Add missing dep to asm/irqflags.h Blackfin: Rename DES PC2() symbol to avoid collision Blackfin: Split the BF532 BFIN_*_FIO_FLAG() functions to their own header Blackfin: Split PLL code from mach-specific cdef headers
2010-10-07Fix IRQ flag handling namingDavid Howells
Fix the IRQ flag handling naming. In linux/irqflags.h under one configuration, it maps: local_irq_enable() -> raw_local_irq_enable() local_irq_disable() -> raw_local_irq_disable() local_irq_save() -> raw_local_irq_save() ... and under the other configuration, it maps: raw_local_irq_enable() -> local_irq_enable() raw_local_irq_disable() -> local_irq_disable() raw_local_irq_save() -> local_irq_save() ... This is quite confusing. There should be one set of names expected of the arch, and this should be wrapped to give another set of names that are expected by users of this facility. Change this to have the arch provide: flags = arch_local_save_flags() flags = arch_local_irq_save() arch_local_irq_restore(flags) arch_local_irq_disable() arch_local_irq_enable() arch_irqs_disabled_flags(flags) arch_irqs_disabled() arch_safe_halt() Then linux/irqflags.h wraps these to provide: raw_local_save_flags(flags) raw_local_irq_save(flags) raw_local_irq_restore(flags) raw_local_irq_disable() raw_local_irq_enable() raw_irqs_disabled_flags(flags) raw_irqs_disabled() raw_safe_halt() with type checking on the flags 'arguments', and then wraps those to provide: local_save_flags(flags) local_irq_save(flags) local_irq_restore(flags) local_irq_disable() local_irq_enable() irqs_disabled_flags(flags) irqs_disabled() safe_halt() with tracing included if enabled. The arch functions can now all be inline functions rather than some of them having to be macros. Signed-off-by: David Howells <dhowells@redhat.com> [X86, FRV, MN10300] Signed-off-by: Chris Metcalf <cmetcalf@tilera.com> [Tile] Signed-off-by: Michal Simek <monstr@monstr.eu> [Microblaze] Tested-by: Catalin Marinas <catalin.marinas@arm.com> [ARM] Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Haavard Skinnemoen <haavard.skinnemoen@atmel.com> [AVR] Acked-by: Tony Luck <tony.luck@intel.com> [IA-64] Acked-by: Hirokazu Takata <takata@linux-m32r.org> [M32R] Acked-by: Greg Ungerer <gerg@uclinux.org> [M68K/M68KNOMMU] Acked-by: Ralf Baechle <ralf@linux-mips.org> [MIPS] Acked-by: Kyle McMartin <kyle@mcmartin.ca> [PA-RISC] Acked-by: Paul Mackerras <paulus@samba.org> [PowerPC] Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [S390] Acked-by: Chen Liqin <liqin.chen@sunplusct.com> [Score] Acked-by: Matt Fleming <matt@console-pimps.org> [SH] Acked-by: David S. Miller <davem@davemloft.net> [Sparc] Acked-by: Chris Zankel <chris@zankel.net> [Xtensa] Reviewed-by: Richard Henderson <rth@twiddle.net> [Alpha] Reviewed-by: Yoshinori Sato <ysato@users.sourceforge.jp> [H8300] Cc: starvik@axis.com [CRIS] Cc: jesper.nilsson@axis.com [CRIS] Cc: linux-cris-kernel@axis.com
2010-08-24x86, paravirt: Remove alloc_pmd_clone hook, only used by VMIAlok Kataria
VMI was the only user of the alloc_pmd_clone hook, given that VMI is now removed we can also remove this hook. Signed-off-by: Alok N Kataria <akataria@vmware.com> LKML-Reference: <1282608357.19396.36.camel@ank32.eng.vmware.com> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-02-27x86, paravirt: Remove kmap_atomic_pte paravirt op.Ian Campbell
Now that both Xen and VMI disable allocations of PTE pages from high memory this paravirt op serves no further purpose. This effectively reverts ce6234b5 "add kmap_atomic_pte for mapping highpte pages". Signed-off-by: Ian Campbell <ian.campbell@citrix.com> LKML-Reference: <1267204562-11844-3-git-send-email-ian.campbell@citrix.com> Acked-by: Alok Kataria <akataria@vmware.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-14locking: Convert __raw_spin* functions to arch_spin*Thomas Gleixner
Name space cleanup. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: linux-arch@vger.kernel.org
2009-12-14locking: Convert raw_spinlock to arch_spinlockThomas Gleixner
The raw_spin* namespace was taken by lockdep for the architecture specific implementations. raw_spin_* would be the ideal name space for the spinlocks which are not converted to sleeping locks in preempt-rt. Linus suggested to convert the raw_ to arch_ locks and cleanup the name space instead of using an artifical name like core_spin, atomic_spin or whatever No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: linux-arch@vger.kernel.org
2009-10-13x86/paravirt: Use normal calling sequences for irq enable/disableJeremy Fitzhardinge
Bastian Blank reported a boot crash with stackprotector enabled, and debugged it back to edx register corruption. For historical reasons irq enable/disable/save/restore had special calling sequences to make them more efficient. With the more recent introduction of higher-level and more general optimisations this is no longer necessary so we can just use the normal PVOP_ macros. This fixes some residual bugs in the old implementations which left edx liable to inadvertent clobbering. Also, fix some bugs in __PVOP_VCALLEESAVE which were revealed by actual use. Reported-by: Bastian Blank <bastian@waldi.eu.org> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stable Kernel <stable@kernel.org> Cc: Xen-devel <xen-devel@lists.xensource.com> LKML-Reference: <4AD3BC9B.7040501@goop.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-18Merge branch 'x86-platform-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (38 commits) x86: Move get/set_wallclock to x86_platform_ops x86: platform: Fix section annotations x86: apic namespace cleanup x86: Distangle ioapic and i8259 x86: Add Moorestown early detection x86: Add hardware_subarch ID for Moorestown x86: Add early platform detection x86: Move tsc_init to late_time_init x86: Move tsc_calibration to x86_init_ops x86: Replace the now identical time_32/64.c by time.c x86: time_32/64.c unify profile_pc x86: Move calibrate_cpu to tsc.c x86: Make timer setup and global variables the same in time_32/64.c x86: Remove mca bus ifdef from timer interrupt x86: Simplify timer_ack magic in time_32.c x86: Prepare unification of time_32/64.c x86: Remove do_timer hook x86: Add timer_init to x86_init_ops x86: Move percpu clockevents setup to x86_init_ops x86: Move xen_post_allocator_init into xen_pagetable_setup_done ... Fix up conflicts in arch/x86/include/asm/io_apic.h
2009-09-16x86: Move get/set_wallclock to x86_platform_opsFeng Tang
get/set_wallclock() have already a set of platform dependent implementations (default, EFI, paravirt). MRST will add another variant. Moving them to platform ops simplifies the existing code and minimizes the effort to integrate new variants. Signed-off-by: Feng Tang <feng.tang@intel.com> LKML-Reference: <new-submission> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-09-01Merge branch 'x86/paravirt' into x86/cpuIngo Molnar
Conflicts: arch/x86/include/asm/paravirt.h Manual merge: arch/x86/include/asm/paravirt_types.h Merge reason: x86/paravirt conflicts non-trivially with x86/cpu, resolve it. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-31x86, msr: Rewrite AMD rd/wrmsr variantsBorislav Petkov
Switch them to native_{rd,wr}msr_safe_regs and remove pv_cpu_ops.read_msr_amd. Signed-off-by: Borislav Petkov <petkovbb@gmail.com> LKML-Reference: <1251705011-18636-2-git-send-email-petkovbb@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-31x86, msr: Add rd/wrmsr interfaces with preset registersBorislav Petkov
native_{rdmsr,wrmsr}_safe_regs are two new interfaces which allow presetting of a subset of eight x86 GPRs before executing the rd/wrmsr instructions. This is needed at least on AMD K8 for accessing an erratum workaround MSR. Originally based on an idea by H. Peter Anvin. Signed-off-by: Borislav Petkov <petkovbb@gmail.com> LKML-Reference: <1251705011-18636-1-git-send-email-petkovbb@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-31x86: Move tsc_calibration to x86_init_opsThomas Gleixner
TSC calibration is modified by the vmware hypervisor and paravirt by separate means. Moorestown wants to add its own calibration routine as well. So make calibrate_tsc a proper x86_init_ops function and override it by paravirt or by the early setup of the vmware hypervisor. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31x86: Add timer_init to x86_init_opsThomas Gleixner
The timer init code is convoluted with several quirks and the paravirt timer chooser. Figuring out which code path is actually taken is not for the faint hearted. Move the numaq TSC quirk to tsc_pre_init x86_init_ops function and replace the paravirt time chooser and the remaining x86 quirk with a simple x86_init_ops function. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31x86: Move percpu clockevents setup to x86_init_opsThomas Gleixner
paravirt overrides the setup of the default apic timers as per cpu timers. Moorestown needs to override that as well. Move it to x86_init_ops setup and create a separate x86_cpuinit struct which holds the function for the secondary evtl. hotplugabble CPUs. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31x86: Move xen_post_allocator_init into xen_pagetable_setup_doneThomas Gleixner
We really do not need two paravirt/x86_init_ops functions which are called in two consecutive source lines. Move the only user of post_allocator_init into the already existing pagetable_setup_done function. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31x86: Move paravirt pagetable_setup to x86_init_opsThomas Gleixner
Replace more paravirt hackery by proper x86_init_ops. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31x86: Move paravirt banner printout to x86_init_opsThomas Gleixner
Replace another obscure paravirt magic and move it to x86_init_ops. Such a hook is also useful for embedded and special hardware. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31x86: Replace ARCH_SETUP by a proper x86_init_opsThomas Gleixner
ARCH_SETUP is a horrible leftover from the old arch/i386 mach support code. It still has a lonely user in xen. Move it to x86_init_ops. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-06-17x86/paravirt: split paravirt definitions into paravirt_types.hJeremy Fitzhardinge
Split the monolithic asm/paravirt.h into separate paravirt.h (inlines and other "active" definitions), and paravirt_types.h (types, constants and other "passive" definitions). This makes it easier to use the type/constant definitions without pulling in everything else and causing circular dependency problems. [ Impact: cleanup ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2009-06-10Merge branch 'x86-xen-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-xen-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (42 commits) xen: cache cr0 value to avoid trap'n'emulate for read_cr0 xen/x86-64: clean up warnings about IST-using traps xen/x86-64: fix breakpoints and hardware watchpoints xen: reserve Xen start_info rather than e820 reserving xen: add FIX_TEXT_POKE to fixmap lguest: update lazy mmu changes to match lguest's use of kvm hypercalls xen: honour VCPU availability on boot xen: add "capabilities" file xen: drop kexec bits from /sys/hypervisor since kexec isn't implemented yet xen/sys/hypervisor: change writable_pt to features xen: add /sys/hypervisor support xen/xenbus: export xenbus_dev_changed xen: use device model for suspending xenbus devices xen: remove suspend_cancel hook xen/dev-evtchn: clean up locking in evtchn xen: export ioctl headers to userspace xen: add /dev/xen/evtchn driver xen: add irq_from_evtchn xen: clean up gate trap/interrupt constants xen: set _PAGE_NX in __supported_pte_mask before pagetable construction ...
2009-05-15x86: Fix performance regression caused by paravirt_ops on native kernelsJeremy Fitzhardinge
Xiaohui Xin and some other folks at Intel have been looking into what's behind the performance hit of paravirt_ops when running native. It appears that the hit is entirely due to the paravirtualized spinlocks introduced by: | commit 8efcbab674de2bee45a2e4cdf97de16b8e609ac8 | Date: Mon Jul 7 12:07:51 2008 -0700 | | paravirt: introduce a "lock-byte" spinlock implementation The extra call/return in the spinlock path is somehow causing an increase in the cycles/instruction of somewhere around 2-7% (seems to vary quite a lot from test to test). The working theory is that the CPU's pipeline is getting upset about the call->call->locked-op->return->return, and seems to be failing to speculate (though I haven't seen anything definitive about the precise reasons). This doesn't entirely make sense, because the performance hit is also visible on unlock and other operations which don't involve locked instructions. But spinlock operations clearly swamp all the other pvops operations, even though I can't imagine that they're nearly as common (there's only a .05% increase in instructions executed). If I disable just the pv-spinlock calls, my tests show that pvops is identical to non-pvops performance on native (my measurements show that it is actually about .1% faster, but Xiaohui shows a .05% slowdown). Summary of results, averaging 10 runs of the "mmperf" test, using a no-pvops build as baseline: nopv Pv-nospin Pv-spin CPU cycles 100.00% 99.89% 102.18% instructions 100.00% 100.10% 100.15% CPI 100.00% 99.79% 102.03% cache ref 100.00% 100.84% 100.28% cache miss 100.00% 90.47% 88.56% cache miss rate 100.00% 89.72% 88.31% branches 100.00% 99.93% 100.04% branch miss 100.00% 103.66% 107.72% branch miss rt 100.00% 103.73% 107.67% wallclock 100.00% 99.90% 102.20% The clear effect here is that the 2% increase in CPI is directly reflected in the final wallclock time. (The other interesting effect is that the more ops are out of line calls via pvops, the lower the cache access and miss rates. Not too surprising, but it suggests that the non-pvops kernel is over-inlined. On the flipside, the branch misses go up correspondingly...) So, what's the fix? Paravirt patching turns all the pvops calls into direct calls, so _spin_lock etc do end up having direct calls. For example, the compiler generated code for paravirtualized _spin_lock is: <_spin_lock+0>: mov %gs:0xb4c8,%rax <_spin_lock+9>: incl 0xffffffffffffe044(%rax) <_spin_lock+15>: callq *0xffffffff805a5b30 <_spin_lock+22>: retq The indirect call will get patched to: <_spin_lock+0>: mov %gs:0xb4c8,%rax <_spin_lock+9>: incl 0xffffffffffffe044(%rax) <_spin_lock+15>: callq <__ticket_spin_lock> <_spin_lock+20>: nop; nop /* or whatever 2-byte nop */ <_spin_lock+22>: retq One possibility is to inline _spin_lock, etc, when building an optimised kernel (ie, when there's no spinlock/preempt instrumentation/debugging enabled). That will remove the outer call/return pair, returning the instruction stream to a single call/return, which will presumably execute the same as the non-pvops case. The downsides arel 1) it will replicate the preempt_disable/enable code at eack lock/unlock callsite; this code is fairly small, but not nothing; and 2) the spinlock definitions are already a very heavily tangled mass of #ifdefs and other preprocessor magic, and making any changes will be non-trivial. The other obvious answer is to disable pv-spinlocks. Making them a separate config option is fairly easy, and it would be trivial to enable them only when Xen is enabled (as the only non-default user). But it doesn't really address the common case of a distro build which is going to have Xen support enabled, and leaves the open question of whether the native performance cost of pv-spinlocks is worth the performance improvement on a loaded Xen system (10% saving of overall system CPU when guests block rather than spin). Still it is a reasonable short-term workaround. [ Impact: fix pvops performance regression when running native ] Analysed-by: "Xin Xiaohui" <xiaohui.xin@intel.com> Analysed-by: "Li Xin" <xin.li@intel.com> Analysed-by: "Nakajima Jun" <jun.nakajima@intel.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Xen-devel <xen-devel@lists.xensource.com> LKML-Reference: <4A0B62F7.5030802@goop.org> [ fixed the help text ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-08Merge branch 'x86/urgent' into x86/xenIngo Molnar
Conflicts: arch/frv/include/asm/pgtable.h arch/x86/include/asm/required-features.h arch/x86/xen/mmu.c Merge reason: x86/xen was on a .29 base still, move it to a fresher branch and pick up Xen fixes as well, plus resolve conflicts Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-09x86: fix set_fixmap to use phys_addr_tMasami Hiramatsu
Use phys_addr_t for receiving a physical address argument instead of unsigned long. This allows fixmap to handle pages higher than 4GB on x86-32. Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-07Merge commit 'origin/master' into for-linus/xen/masterJeremy Fitzhardinge
* commit 'origin/master': (4825 commits) Fix build errors due to CONFIG_BRANCH_TRACER=y parport: Use the PCI IRQ if offered tty: jsm cleanups Adjust path to gpio headers KGDB_SERIAL_CONSOLE check for module Change KCONFIG name tty: Blackin CTS/RTS Change hardware flow control from poll to interrupt driven Add support for the MAX3100 SPI UART. lanana: assign a device name and numbering for MAX3100 serqt: initial clean up pass for tty side tty: Use the generic RS485 ioctl on CRIS tty: Correct inline types for tty_driver_kref_get() splice: fix deadlock in splicing to file nilfs2: support nanosecond timestamp nilfs2: introduce secondary super block nilfs2: simplify handling of active state of segments nilfs2: mark minor flag for checkpoint created by internal operation nilfs2: clean up sketch file nilfs2: super block operations fix endian bug ... Conflicts: arch/x86/include/asm/thread_info.h arch/x86/lguest/boot.c drivers/xen/manage.c
2009-03-30x86/paravirt: finish change from lazy cpu to context switch start/endJeremy Fitzhardinge
Impact: fix lazy context switch API Pass the previous and next tasks into the context switch start end calls, so that the called functions can properly access the task state (esp in end_context_switch, in which the next task is not yet completely current). Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2009-03-30x86/paravirt: flush pending mmu updates on context switchJeremy Fitzhardinge
Impact: allow preemption during lazy mmu updates If we're in lazy mmu mode when context switching, leave lazy mmu mode, but remember the task's state in TIF_LAZY_MMU_UPDATES. When we resume the task, check this flag and re-enter lazy mmu mode if its set. This sets things up for allowing lazy mmu mode while preemptible, though that won't actually be active until the next change. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2009-03-30x86/pvops: replace arch_enter_lazy_cpu_mode with arch_start_context_switchJeremy Fitzhardinge
Impact: simplification, prepare for later changes Make lazy cpu mode more specific to context switching, so that it makes sense to do more context-switch specific things in the callbacks. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2009-03-19x86: with the last user gone, remove set_pte_presentJeremy Fitzhardinge
Impact: cleanup set_pte_present() is no longer used, directly or indirectly, so remove it. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Xen-devel <xen-devel@lists.xensource.com> Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Avi Kivity <avi@redhat.com> LKML-Reference: <1237406613-2929-2-git-send-email-jeremy@goop.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-18Merge branches 'x86/cleanups', 'x86/cpu', 'x86/debug', 'x86/mce2', 'x86/mm', ↵Ingo Molnar
'x86/mtrr', 'x86/setup', 'x86/setup-memory', 'x86/urgent', 'x86/uv', 'x86/x2apic' and 'linus' into x86/core Conflicts: arch/parisc/kernel/irq.c
2009-03-17x86, paravirt: prevent gcc from generating the wrong addressing modeJeremy Fitzhardinge
Impact: fix crash on VMI (VMware) When we generate a call sequence for calling a paravirtualized function, we presume that the generated code is "call *0xXXXXX", which is a 6 byte opcode; this is larger than a normal direct call, and so we can patch a direct call over it. At the moment, however we give gcc enough rope to hang us by putting the address in a register and generating a two byte indirect-via-register call. Prevent this by explicitly dereferencing the function pointer and passing it into the asm as a constant. This prevents crashes in VMI, as it cannot handle unpatchable callsites. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Alok Kataria <akataria@vmware.com> LKML-Reference: <49BEEDC2.2070809@goop.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-17Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, vm86: fix preemption bug x86, olpc: fix model detection without OFW x86, hpet: fix for LS21 + HPET = boot hang x86: CPA avoid repeated lazy mmu flush x86: warn if arch_flush_lazy_mmu_cpu is called in preemptible context x86/paravirt: make arch_flush_lazy_mmu/cpu disable preemption x86, pat: fix warn_on_once() while mapping 0-1MB range with /dev/mem x86/cpa: make sure cpa is safe to call in lazy mmu mode x86, ptrace, mm: fix double-free on race