summaryrefslogtreecommitdiff
path: root/arch/powerpc/kernel/time.c
AgeCommit message (Collapse)Author
2010-05-12powerpc/perf_event: Fix oops due to perf_event_do_pending callPaul Mackerras
Anton Blanchard found that large POWER systems would occasionally crash in the exception exit path when profiling with perf_events. The symptom was that an interrupt would occur late in the exit path when the MSR[RI] (recoverable interrupt) bit was clear. Interrupts should be hard-disabled at this point but they were enabled. Because the interrupt was not recoverable the system panicked. The reason is that the exception exit path was calling perf_event_do_pending after hard-disabling interrupts, and perf_event_do_pending will re-enable interrupts. The simplest and cleanest fix for this is to use the same mechanism that 32-bit powerpc does, namely to cause a self-IPI by setting the decrementer to 1. This means we can remove the tests in the exception exit path and raw_local_irq_restore. This also makes sure that the call to perf_event_do_pending from timer_interrupt() happens within irq_enter/irq_exit. (Note that calling perf_event_do_pending from timer_interrupt does not mean that there is a possible 1/HZ latency; setting the decrementer to 1 ensures that the timer interrupt will happen immediately, i.e. within one timebase tick, which is a few nanoseconds or 10s of nanoseconds.) Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: stable@kernel.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-02-17powerpc: Add timer, performance monitor and machine check counts to ↵Anton Blanchard
/proc/interrupts With NO_HZ it is useful to know how often the decrementer is going off. The patch below adds an entry for it and also adds it into the /proc/stat summaries. While here, I added performance monitoring and machine check exceptions. I found it useful to keep an eye on the PMU exception rate when using the perf tool. Since it's possible to take a completely handled machine check on a System p box it also sounds like a good idea to keep a machine check summary. The event naming matches x86 to keep gratuitous differences to a minimum. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-02-09powerpc: Only print clockevent settings onceAnton Blanchard
The clockevent multiplier and shift is useful information, but we only need to print it once. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-02-03powerpc: Replace per_cpu(, smp_processor_id()) with __get_cpu_var()Anton Blanchard
The cputime code has a few places that do per_cpu(, smp_processor_id()). Replace them with __get_cpu_var(). Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-01-15powerpc: Fix decrementer setup on 1GHz boardsStefan Roese
We noticed that recent kernels didn't boot on our 1GHz Canyonlands 460EX boards anymore. As it seems, patch 8d165db1 [powerpc: Improve decrementer accuracy] introduced this problem. The routine div_sc() overflows with shift = 32 resulting in this incorrect setup: time_init: decrementer frequency = 1000.000012 MHz time_init: processor frequency = 1000.000012 MHz clocksource: timebase mult[400000] shift[22] registered clockevent: decrementer mult[33] shift[32] cpu[0] This patch now introduces a local div_dc64() version of this function so that this overflow doesn't happen anymore. Signed-off-by: Stefan Roese <sr@denx.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Detlev Zundel <dzu@denx.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-12-12Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (151 commits) powerpc: Fix usage of 64-bit instruction in 32-bit altivec code MAINTAINERS: Add PowerPC patterns powerpc/pseries: Track previous CPPR values to correctly EOI interrupts powerpc/pseries: Correct pseries/dlpar.c build break without CONFIG_SMP powerpc: Make "intspec" pointers in irq_host->xlate() const powerpc/8xx: DTLB Miss cleanup powerpc/8xx: Remove DIRTY pte handling in DTLB Error. powerpc/8xx: Start using dcbX instructions in various copy routines powerpc/8xx: Restore _PAGE_WRITETHRU powerpc/8xx: Add missing Guarded setting in DTLB Error. powerpc/8xx: Fixup DAR from buggy dcbX instructions. powerpc/8xx: Tag DAR with 0x00f0 to catch buggy instructions. powerpc/8xx: Update TLB asm so it behaves as linux mm expects. powerpc/8xx: Invalidate non present TLBs powerpc/pseries: Serialize cpu hotplug operations during deactivate Vs deallocate pseries/pseries: Add code to online/offline CPUs of a DLPAR node powerpc: stop_this_cpu: remove the cpu from the online map. powerpc/pseries: Add kernel based CPU DLPAR handling sysfs/cpu: Add probe/release files powerpc/pseries: Kernel DLPAR Infrastructure ...
2009-12-09Merge commit 'origin/master' into nextBenjamin Herrenschmidt
Conflicts: include/linux/kvm.h
2009-12-09Merge branch 'timers-for-linus-urgent' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-for-linus-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: hrtimer: Fix /proc/timer_list regression itimers: Fix racy writes to cpu_itimer fields timekeeping: Fix clock_gettime vsyscall time warp
2009-12-09Merge branch 'timers-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: timers, init: Limit the number of per cpu calibration bootup messages posix-cpu-timers: optimize and document timer_create callback clockevents: Add missing include to pacify sparse x86: vmiclock: Fix printk format x86: Fix printk format due to variable type change sparc: fix printk for change of variable type clocksource/events: Fix fallout of generic code changes nohz: Allow 32-bit machines to sleep for more than 2.15 seconds nohz: Track last do_timer() cpu nohz: Prevent clocksource wrapping during idle nohz: Type cast printk argument mips: Use generic mult/shift factor calculation for clocks clocksource: Provide a generic mult/shift factor calculation clockevents: Use u32 for mult and shift factors nohz: Introduce arch_needs_cpu nohz: Reuse ktime in sub-functions of tick_check_idle. time: Remove xtime_cache time: Implement logarithmic time accumulation
2009-11-17timekeeping: Fix clock_gettime vsyscall time warpLin Ming
Since commit 0a544198 "timekeeping: Move NTP adjusted clock multiplier to struct timekeeper" the clock multiplier of vsyscall is updated with the unmodified clock multiplier of the clock source and not with the NTP adjusted multiplier of the timekeeper. This causes user space observerable time warps: new CLOCK-warp maximum: 120 nsecs, 00000025c337c537 -> 00000025c337c4bf Add a new argument "mult" to update_vsyscall() and hand in the timekeeping internal NTP adjusted multiplier. Signed-off-by: Lin Ming <ming.m.lin@intel.com> Cc: "Zhang Yanmin" <yanmin_zhang@linux.intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Tony Luck <tony.luck@intel.com> LKML-Reference: <1258436990.17765.83.camel@minggr.sh.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-11-15Merge branches 'perf/powerpc' and 'perf/bench' into perf/coreIngo Molnar
Merge reason: Both 'perf bench' and the pending PowerPC changes are now ready for the next merge window. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-13clocksource/events: Fix fallout of generic code changesThomas Gleixner
powerpc grew a new warning due to the type change of clockevent->mult. The architectures which use parts of the generic time keeping infrastructure tripped over my wrong assumption that clocksource_register is only used when GENERIC_TIME=y. I should have looked and also I should have known better. These renitent Gaul villages are racking my nerves. Some serious deprecating is due. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-11-11Merge commit 'origin/master' into nextBenjamin Herrenschmidt
2009-11-05powerpc: Avoid giving out RTC dates below EPOCHBenjamin Herrenschmidt
Doing so causes xtime to be negative which crashes the timekeeping code in funny ways when doing suspend/resume Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-11-05Export symbols for KVM moduleAlexander Graf
We want to be able to build KVM as a module. To enable us doing so, we need some more exports from core Linux parts. This patch exports all functions and variables that are required for KVM. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-10-28powerpc: tracing: Add powerpc tracepoints for timer entry and exitAnton Blanchard
We can monitor the effectiveness of our power management of both the kernel and hypervisor by probing the timer interrupt. For example, on this box we see 10.37s timer interrupts on an idle core: <idle>-0 [010] 3900.671297: timer_interrupt_entry: pt_regs=c0000000ce1e7b10 <idle>-0 [010] 3900.671302: timer_interrupt_exit: pt_regs=c0000000ce1e7b10 <idle>-0 [010] 3911.042963: timer_interrupt_entry: pt_regs=c0000000ce1e7b10 <idle>-0 [010] 3911.042968: timer_interrupt_exit: pt_regs=c0000000ce1e7b10 <idle>-0 [010] 3921.414630: timer_interrupt_entry: pt_regs=c0000000ce1e7b10 <idle>-0 [010] 3921.414635: timer_interrupt_exit: pt_regs=c0000000ce1e7b10 Since we have a 207MHz decrementer it will go negative and fire every 10.37s even if Linux is completely idle. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-09-23Merge branch 'timers-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: itimers: Add tracepoints for itimer hrtimer: Add tracepoint for hrtimers timers: Add tracepoints for timer_list timers cputime: Optimize jiffies_to_cputime(1) itimers: Simplify arm_timer() code a bit itimers: Fix periodic tics precision itimers: Merge ITIMER_VIRT and ITIMER_PROF Trivial header file include conflicts in kernel/fork.c
2009-09-21perf: Do the big rename: Performance Counters -> Performance EventsIngo Molnar
Bye-bye Performance Counters, welcome Performance Events! In the past few months the perfcounters subsystem has grown out its initial role of counting hardware events, and has become (and is becoming) a much broader generic event enumeration, reporting, logging, monitoring, analysis facility. Naming its core object 'perf_counter' and naming the subsystem 'perfcounters' has become more and more of a misnomer. With pending code like hw-breakpoints support the 'counter' name is less and less appropriate. All in one, we've decided to rename the subsystem to 'performance events' and to propagate this rename through all fields, variables and API names. (in an ABI compatible fashion) The word 'event' is also a bit shorter than 'counter' - which makes it slightly more convenient to write/handle as well. Thanks goes to Stephane Eranian who first observed this misnomer and suggested a rename. User-space tooling and ABI compatibility is not affected - this patch should be function-invariant. (Also, defconfigs were not touched to keep the size down.) This patch has been generated via the following script: FILES=$(find * -type f | grep -vE 'oprofile|[^K]config') sed -i \ -e 's/PERF_EVENT_/PERF_RECORD_/g' \ -e 's/PERF_COUNTER/PERF_EVENT/g' \ -e 's/perf_counter/perf_event/g' \ -e 's/nb_counters/nb_events/g' \ -e 's/swcounter/swevent/g' \ -e 's/tpcounter_event/tp_event/g' \ $FILES for N in $(find . -name perf_counter.[ch]); do M=$(echo $N | sed 's/perf_counter/perf_event/g') mv $N $M done FILES=$(find . -name perf_event.*) sed -i \ -e 's/COUNTER_MASK/REG_MASK/g' \ -e 's/COUNTER/EVENT/g' \ -e 's/\<event\>/event_id/g' \ -e 's/counter/event/g' \ -e 's/Counter/Event/g' \ $FILES ... to keep it as correct as possible. This script can also be used by anyone who has pending perfcounters patches - it converts a Linux kernel tree over to the new naming. We tried to time this change to the point in time where the amount of pending patches is the smallest: the end of the merge window. Namespace clashes were fixed up in a preparatory patch - and some stylistic fallout will be fixed up in a subsequent patch. ( NOTE: 'counters' are still the proper terminology when we deal with hardware registers - and these sed scripts are a bit over-eager in renaming them. I've undone some of that, but in case there's something left where 'counter' would be better than 'event' we can undo that on an individual basis instead of touching an otherwise nicely automated patch. ) Suggested-by: Stephane Eranian <eranian@google.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Paul Mackerras <paulus@samba.org> Reviewed-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Howells <dhowells@redhat.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <linux-arch@vger.kernel.org> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-18Merge branch 'timers-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (34 commits) time: Prevent 32 bit overflow with set_normalized_timespec() clocksource: Delay clocksource down rating to late boot clocksource: clocksource_select must be called with mutex locked clocksource: Resolve cpu hotplug dead lock with TSC unstable, fix crash timers: Drop a function prototype clocksource: Resolve cpu hotplug dead lock with TSC unstable timer.c: Fix S/390 comments timekeeping: Fix invalid getboottime() value timekeeping: Fix up read_persistent_clock() breakage on sh timekeeping: Increase granularity of read_persistent_clock(), build fix time: Introduce CLOCK_REALTIME_COARSE x86: Do not unregister PIT clocksource on PIT oneshot setup/shutdown clocksource: Avoid clocksource watchdog circular locking dependency clocksource: Protect the watchdog rating changes with clocksource_mutex clocksource: Call clocksource_change_rating() outside of watchdog_lock timekeeping: Introduce read_boot_clock timekeeping: Increase granularity of read_persistent_clock() timekeeping: Update clocksource with stop_machine timekeeping: Add timekeeper read_clock helper functions timekeeping: Move NTP adjusted clock multiplier to struct timekeeper ... Fix trivial conflict due to MIPS lemote -> loongson renaming.
2009-08-29Merge branch 'timers/posixtimers' into timers/tracingThomas Gleixner
Merge reason: timer tracepoint patches depend on both branches Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-28powerpc: Properly start decrementer on BookE secondary CPUsBenjamin Herrenschmidt
This moves the code to start the decrementer on 40x and BookE into a separate function which is now called from time_init() and secondary_time_init(), before the respective clock sources are registered. We also remove the 85xx specific code for doing it from the platform code. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-23timekeeping: Increase granularity of read_persistent_clock(), build fixMartin Schwidefsky
Fix the following build problem on powerpc: arch/powerpc/kernel/time.c: In function 'read_persistent_clock': arch/powerpc/kernel/time.c:788: error: 'return' with a value, in function returning void arch/powerpc/kernel/time.c:791: error: 'return' with a value, in function returning void Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: dwalker@fifo99.com Cc: johnstul@us.ibm.com LKML-Reference: <20090822222313.74b9619c@skybase> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-20powerpc: Use DIV_ROUND_CLOSEST in time init codeJulia Lawall
The kernel.h macro DIV_ROUND_CLOSEST performs the computation (x + d/2)/d but is perhaps more readable. The semantic patch that makes this change is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @haskernel@ @@ #include <linux/kernel.h> @depends on haskernel@ expression x,__divisor; @@ - (((x) + ((__divisor) / 2)) / (__divisor)) + DIV_ROUND_CLOSEST(x,__divisor) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-15timekeeping: Increase granularity of read_persistent_clock()Martin Schwidefsky
The persistent clock of some architectures (e.g. s390) have a better granularity than seconds. To reduce the delta between the host clock and the guest clock in a virtualized system change the read_persistent_clock function to return a struct timespec. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Daniel Walker <dwalker@fifo99.com> LKML-Reference: <20090814134811.013873340@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-03cputime: Optimize jiffies_to_cputime(1)Stanislaw Gruszka
For powerpc with CONFIG_VIRT_CPU_ACCOUNTING jiffies_to_cputime(1) is not compile time constant and run time calculations are quite expensive. To optimize we use precomputed value. For all other architectures is is preprocessor definition. Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> LKML-Reference: <1248862529-6063-5-git-send-email-sgruszka@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-18perf_counter: powerpc: Enable use of software counters on 32-bit powerpcPaul Mackerras
This enables the perf_counter subsystem on 32-bit powerpc. Since we don't have any support for hardware counters on 32-bit powerpc yet, only software counters can be used. Besides selecting HAVE_PERF_COUNTERS for 32-bit powerpc as well as 64-bit, the main thing this does is add an implementation of set_perf_counter_pending(). This needs to arrange for perf_counter_do_pending() to be called when interrupts are enabled. Rather than add code to local_irq_restore as 64-bit does, the 32-bit set_perf_counter_pending() generates an interrupt by setting the decrementer to 1 so that a decrementer interrupt will become pending in 1 or 2 timebase ticks (if a decrementer interrupt isn't already pending). When interrupts are enabled, timer_interrupt() will be called, and some new code in there calls perf_counter_do_pending(). We use a per-cpu array of flags to indicate whether we need to call perf_counter_do_pending() or not. This introduces a couple of new Kconfig symbols: PPC_HAVE_PMU_SUPPORT, which is selected by processor families for which we have hardware PMU support (currently only PPC64), and PPC_PERF_CTRS, which enables the powerpc-specific perf_counter back-end. Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: linuxppc-dev@ozlabs.org Cc: benh@kernel.crashing.org LKML-Reference: <19000.55404.103840.393470@cargo.ozlabs.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-15powerpc: Don't do generic calibrate_delay()Benjamin Herrenschmidt
Currently we are wasting time calling the generic calibrate_delay() function. We don't need it since our implementation of __delay() is based on the CPU timebase. So instead, we use our own small implementation that initializes loops_per_jiffy to something sensible to make the few users like spinlock debug be happy Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-21powerpc: Improve decrementer accuracyAnton Blanchard
I have been looking at sources of OS jitter and notice that after a long NO_HZ idle period we wakeup too early: relative time (us) event timer irq exit 999946.405 timer irq entry 4.835 timer irq exit 21.685 timer irq entry 3.540 timer (tick_sched_timer) entry Here we slept for just under a second then took a timer interrupt that did nothing. 21.685 us later we wake up again and do the work. We set a rather low shift value of 16 for the decrementer clockevent, which I think is causing this issue. On this box we have a 207MHz decrementer and see: clockevent: decrementer mult[3501] shift[16] cpu[0] For calculations of large intervals this mult/shift combination could be off by a significant amount. I notice the sparc code has a loop that iterates to find a mult/shift combination that maximises the shift value while keeping mult under 32bit. With the patch below we get: clockevent: decrementer mult[35015c20] shift[32] cpu[15] And we no longer see the spurious wakeups. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-04-21clocksource: pass clocksource to read() callbackMagnus Damm
Pass clocksource pointer to the read() callback for clocksources. This allows us to share the callback between multiple instances. [hugh@veritas.com: fix powerpc build of clocksource pass clocksource mods] [akpm@linux-foundation.org: cleanup] Signed-off-by: Magnus Damm <damm@igel.co.jp> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-02powerpc: Hook up rtc-generic, and kill rtc-ppcGeert Uytterhoeven
PowerPC has been a long time user of the generic RTC abstraction, so hook up rtc-generic: - Create the "rtc-generic" platform device if ppc_md.get_rtc_time is set, - Kill rtc-ppc, as rtc-generic offers the same functionality in a more generic way, and supports autoloading through udev. Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Acked-by: David Woodhouse <David.Woodhouse@intel.com> Acked-by: Alessandro Zummo <a.zummo@towertech.it> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
2009-01-03Merge branch 'cputime' of git://git390.osdl.marist.edu/pub/scm/linux-2.6Linus Torvalds
* 'cputime' of git://git390.osdl.marist.edu/pub/scm/linux-2.6: [PATCH] fast vdso implementation for CLOCK_THREAD_CPUTIME_ID [PATCH] improve idle cputime accounting [PATCH] improve precision of idle time detection. [PATCH] improve precision of process accounting. [PATCH] idle cputime accounting [PATCH] fix scaled & unscaled cputime accounting
2009-01-02Merge branch 'cpus4096-for-linus-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'cpus4096-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (66 commits) x86: export vector_used_by_percpu_irq x86: use logical apicid in x2apic_cluster's x2apic_cpu_mask_to_apicid_and() sched: nominate preferred wakeup cpu, fix x86: fix lguest used_vectors breakage, -v2 x86: fix warning in arch/x86/kernel/io_apic.c sched: fix warning in kernel/sched.c sched: move test_sd_parent() to an SMP section of sched.h sched: add SD_BALANCE_NEWIDLE at MC and CPU level for sched_mc>0 sched: activate active load balancing in new idle cpus sched: bias task wakeups to preferred semi-idle packages sched: nominate preferred wakeup cpu sched: favour lower logical cpu number for sched_mc balance sched: framework for sched_mc/smt_power_savings=N sched: convert BALANCE_FOR_xx_POWER to inline functions x86: use possible_cpus=NUM to extend the possible cpus allowed x86: fix cpu_mask_to_apicid_and to include cpu_online_mask x86: update io_apic.c to the new cpumask code x86: Introduce topology_core_cpumask()/topology_thread_cpumask() x86: xen: use smp_call_function_many() x86: use work_on_cpu in x86/kernel/cpu/mcheck/mce_amd_64.c ... Fixed up trivial conflict in kernel/time/tick-sched.c manually
2008-12-31[PATCH] idle cputime accountingMartin Schwidefsky
The cpu time spent by the idle process actually doing something is currently accounted as idle time. This is plain wrong, the architectures that support VIRT_CPU_ACCOUNTING=y can do better: distinguish between the time spent doing nothing and the time spent by idle doing work. The first is accounted with account_idle_time and the second with account_system_time. The architectures that use the account_xxx_time interface directly and not the account_xxx_ticks interface now need to do the check for the idle process in their arch code. In particular to improve the system vs true idle time accounting the arch code needs to measure the true idle time instead of just testing for the idle process. To improve the tick based accounting as well we would need an architecture primitive that can tell us if the pt_regs of the interrupted context points to the magic instruction that halts the cpu. In addition idle time is no more added to the stime of the idle process. This field now contains the system time of the idle process as it should be. On systems without VIRT_CPU_ACCOUNTING this will always be zero as every tick that occurs while idle is running will be accounted as idle time. This patch contains the necessary common code changes to be able to distinguish idle system time and true idle time. The architectures with support for VIRT_CPU_ACCOUNTING need some changes to exploit this. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-12-31[PATCH] fix scaled & unscaled cputime accountingMartin Schwidefsky
The utimescaled / stimescaled fields in the task structure and the global cpustat should be set on all architectures. On s390 the calls to account_user_time_scaled and account_system_time_scaled never have been added. In addition system time that is accounted as guest time to the user time of a process is accounted to the scaled system time instead of the scaled user time. To fix the bugs and to prevent future forgetfulness this patch merges account_system_time_scaled into account_system_time and account_user_time_scaled into account_user_time. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Michael Neuling <mikey@neuling.org> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-12-13cpumask: convert struct clock_event_device to cpumask pointers.Rusty Russell
Impact: change calling convention of existing clock_event APIs struct clock_event_timer's cpumask field gets changed to take pointer, as does the ->broadcast function. Another single-patch change. For safety, we BUG_ON() in clockevents_register_device() if it's not set. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Ingo Molnar <mingo@elte.hu>
2008-11-05powerpc: Eliminate unused do_gtod variablePaul Mackerras
Since we started using the generic timekeeping code, we haven't had a powerpc-specific version of do_gettimeofday, and hence there is now nothing that reads the do_gtod variable in arch/powerpc/kernel/time.c. This therefore removes it and the code that sets it. Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-11-05powerpc: Improve resolution of VDSO clock_gettimePaul Mackerras
Currently the clock_gettime implementation in the VDSO produces a result with microsecond resolution for the cases that are handled without a system call, i.e. CLOCK_REALTIME and CLOCK_MONOTONIC. The nanoseconds field of the result is obtained by computing a microseconds value and multiplying by 1000. This changes the code in the VDSO to do the computation for clock_gettime with nanosecond resolution. That means that the resolution of the result will ultimately depend on the timebase frequency. Because the timestamp in the VDSO datapage (stamp_xsec, the real time corresponding to the timebase count in tb_orig_stamp) is in units of 2^-20 seconds, it doesn't have sufficient resolution for computing a result with nanosecond resolution. Therefore this adds a copy of xtime to the VDSO datapage and updates it in update_gtod() along with the other time-related fields. Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-07-16Merge commit 'origin/master'Benjamin Herrenschmidt
Manual merge of: arch/powerpc/Kconfig arch/powerpc/kernel/stacktrace.c arch/powerpc/mm/slice.c arch/ppc/kernel/smp.c
2008-07-14powerpc/booke: don't reinitialize time baseKumar Gala
For some reason long ago I decided that we should zero out the time base when we calibrate the decrementer. The problem is that this can be harmful in SMP systems where the firmware has already synchronized the time bases on the various cores. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2008-06-26on_each_cpu(): kill unused 'retry' parameterJens Axboe
It's not even passed on to smp_call_function() anymore, since that was removed. So kill it. Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-05-14[POWERPC] Fix sparse warnings in arch/powerpc/kernelMichael Ellerman
Make a few things static in lparcfg.c Make init and exit routines static in rtas_flash.c Make things static in rtas_pci.c Make some functions static in rtas.c Make fops static in rtas-proc.c Remove unneeded extern for do_gtod in smp.c Make clocksource_init() static in time.c Make last_tick_len and ticklen_to_xs static in time.c Move the declaration of the pvr per-cpu into smp.h Make kexec_smp_down() and kexec_stack static in machine_kexec_64.c Don't return void in arch_teardown_msi_irqs() in msi.c Move declaration of GregorianDay()into asm/time.h Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-05-01ntp: rename TICK_LENGTH_SHIFT to NTP_SCALE_SHIFTRoman Zippel
As TICK_LENGTH_SHIFT is used for more than just the tick length, the name isn't quite approriate anymore, so this renames it to NTP_SCALE_SHIFT. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Cc: john stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-01ntp: increase time_freq resolutionRoman Zippel
This changes time_freq to a 64bit value and makes it static (the only outside user had no real need to modify it). Intermediate values were already 64bit, so the change isn't that big, but it saves a little in shifts by replacing SHIFT_NSEC with TICK_LENGTH_SHIFT. PPM_SCALE is then used to convert between user space and kernel space representation. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Cc: john stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06taskstats scaled time cleanupMichael Neuling
This moves the ability to scale cputime into generic code. This allows us to fix the issue in kernel/timer.c (noticed by Balbir) where we could only add an unscaled value to the scaled utime/stime. This adds a cputime_to_scaled function. As before, the POWERPC version does the scaling based on the last SPURR/PURR ratio calculated. The generic and s390 (only other arch to implement asm/cputime.h) versions are both NOPs. Also moves the SPURR and PURR snapshots closer. Signed-off-by: Michael Neuling <mikey@neuling.org> Cc: Jay Lan <jlan@engr.sgi.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-21[POWERPC] Implement arch disable/enable irq hooks.Scott Wood
These hooks ensure that a decrementer interrupt is not pending when suspending; otherwise, problems may occur on 6xx/7xx/7xxx-based systems (except for powermacs, which use a separate suspend path). For example, with deep sleep on the 831x, a pending decrementer will cause a system freeze because the SoC thinks the decrementer interrupt would have woken the system, but the core must have interrupts disabled due to the setup required for deep sleep. Changed via-pmu.c to use the new ppc_md hooks, and made the arch_* functions call the generic_* functions unconditionally. -- paulus Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-12-20[POWERPC] Optimize account_system_vtimeMilton Miller
We have multiple calls to has_feature being inlined, but gcc can't be sure that the store via get_paca() doesn't alias the path to cur_cpu_spec->feature. Reorder to put the calls to read_purr and read_spurr adjacent to each other. To add a sense of consistency, reorder the remaining lines to perform parallel steps on purr and scaled purr of each line instead of calculating and then using one value before going on to the next. In addition, we can tell gcc that no SPURR means no PURR. The test is completely hidden in the PURR case, and in the !PURR case the second test is eliminated resulting in the simple register copy in the out-of-line branch. Further, gcc sees get_paca()->system_time referenced several times and allocates a register to address it (shadowing r13) instead of caching its value. Reading into a local varable saves the shadow of r13 and removes a potentially duplicate load (between the nested if and its parent). Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-12-20[POWERPC] Depend on ->initialized in calc_steal_timeMilton Miller
If CPU_FTR_PURR is not set, we will never set cpu_purr_data->initialized. Checking via __get_cpu_var on 64 bit avoids one dependent load compared to cpu_has_feature in the not-present case, and is always required when it is present. The code is under CONFIG_VIRT_CPU_ACCOUNTING so 32 bit will not be affected. Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-12-20[POWERPC] Timer interrupt: use a struct for two per_cpu varablesMilton Miller
timer_interrupt() was calculating per_cpu_offset several times, having to start from the toc because of potential aliasing issues. Placing both decrementer per_cpu varables in a struct and calculating the address once with __get_cpu_var results in better code on both 32 and 64 bit. Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-12-20[POWERPC] Use __get_cpu_var in time.cMilton Miller
Use __get_cpu_var(x) instead of per_cpu(x, smp_processor_id()), as it is optimized on ppc64 to access the current cpu's per-cpu offset directly; it's local_paca.offset instead of TOC->paca[local_paca->processor_id].offset. This is the trivial portion, two functions with one use each. Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-12-20[POWERPC] init_decrementer_clockevent can be static __initMilton Miller
as its only called from time_init, which is __init. Also remove unneeded forward declaration. Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Paul Mackerras <paulus@samba.org>