summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2011-09-29rcu: Fix mismatched variable in rcutree_trace.cAndi Kleen
rcutree.c defines rcu_cpu_kthread_cpu as int, not unsigned int, so the extern has to follow that. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-09-29rcu: Restore checks for blocking in RCU read-side critical sectionsPaul E. McKenney
Long ago, using TREE_RCU with PREEMPT would result in "scheduling while atomic" diagnostics if you blocked in an RCU read-side critical section. However, PREEMPT now implies TREE_PREEMPT_RCU, which defeats this diagnostic. This commit therefore adds a replacement diagnostic based on PROVE_RCU. Because rcu_lockdep_assert() and lockdep_rcu_dereference() are now being used for things that have nothing to do with rcu_dereference(), rename lockdep_rcu_dereference() to lockdep_rcu_suspicious() and add a third argument that is a string indicating what is suspicious. This third argument is passed in from a new third argument to rcu_lockdep_assert(). Update all calls to rcu_lockdep_assert() to add an informative third argument. Also, add a pair of rcu_lockdep_assert() calls from within rcu_note_context_switch(), one complaining if a context switch occurs in an RCU-bh read-side critical section and another complaining if a context switch occurs in an RCU-sched read-side critical section. These are present only if the PROVE_RCU kernel parameter is enabled. Finally, fix some checkpatch whitespace complaints in lockdep.c. Again, you must enable PROVE_RCU to see these new diagnostics. But you are enabling PROVE_RCU to check out new RCU uses in any case, aren't you? Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-09-29rcu: Avoid unnecessary self-wakeup of per-CPU kthreadsShaohua Li
There are a number of cases where the RCU can find additional work for the per-CPU kthread within the context of that per-CPU kthread. In such cases, the per-CPU kthread is already running, so attempting to wake itself up does nothing except waste CPU cycles. This commit therefore checks to see if it is in the per-CPU kthread context, omitting the wakeup in this case. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-09-29rcu: Use kthread_create_on_node()Eric Dumazet
Commit a26ac2455ffc (move TREE_RCU from softirq to kthread) added per-CPU kthreads. However, kthread creation uses kthread_create(), which can put the kthread's stack and task struct on the wrong NUMA node. Therefore, use kthread_create_on_node() instead of kthread_create() so that the stacks and task structs are placed on the correct NUMA node. A similar change was carried out in commit 94dcf29a11b3 (kthread: use kthread_create_on_node()). Also change rcutorture's priority-boost-test kthread creation. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Tejun Heo <tj@kernel.org> CC: Rusty Russell <rusty@rustcorp.com.au> CC: Andrew Morton <akpm@linux-foundation.org> CC: Andi Kleen <ak@linux.intel.com> CC: Ingo Molnar <mingo@elte.hu> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-09-28connector: add comm change event report to proc connectorVladimir Zapolskiy
Add an event to monitor comm value changes of tasks. Such an event becomes vital, if someone desires to control threads of a process in different manner. A natural characteristic of threads is its comm value, and helpfully application developers have an opportunity to change it in runtime. Reporting about such events via proc connector allows to fine-grain monitoring and control potentials, for instance a process control daemon listening to proc connector and following comm value policies can place specific threads to assigned cgroup partitions. It might be possible to achieve a pale partial one-shot likeness without this update, if an application changes comm value of a thread generator task beforehand, then a new thread is cloned, and after that proc connector listener gets the fork event and reads new thread's comm value from procfs stat file, but this change visibly simplifies and extends the matter. Signed-off-by: Vladimir Zapolskiy <vzapolskiy@gmail.com> Acked-by: Evgeniy Polyakov <zbr@ioremap.net> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-27PM / Runtime: Introduce trace points for tracing rpm_* functionsMing Lei
This patch introduces 3 trace points to prepare for tracing rpm_idle/rpm_suspend/rpm_resume functions, so we can use these trace points to replace the current dev_dbg(). Signed-off-by: Ming Lei <ming.lei@canonical.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-09-27treewide: Correct spelling of successfully in commentsJoe Perches
Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-09-26sched: Remove redundant test in check_preempt_tick()Wang Xingchao
The caller already checks for nr_running > 1, therefore we don't have to do so again. Signed-off-by: Wang Xingchao <xingchao.wang@intel.com> Reviewed-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1316194552-12019-1-git-send-email-xingchao.wang@intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-26Merge commit 'v3.1-rc7' into perf/coreIngo Molnar
Merge reason: Pick up the latest upstream fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-26sched: Fix up wchan borkageSimon Kirby
Commit c259e01a1ec ("sched: Separate the scheduler entry for preemption") contained a boo-boo wrecking wchan output. It forgot to put the new schedule() function in the __sched section and thereby doesn't get properly ignored for things like wchan. Tested-by: Simon Kirby <sim@hostway.ca> Cc: stable@kernel.org # 2.6.39+ Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110923000346.GA25425@hostway.ca Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-26Merge branch 'for_3_2/for-rmk/arm_cpu_pm' of ↵Russell King
git://gitorious.org/omap-sw-develoment/linux-omap-dev into devel-stable
2011-09-25ptrace: PTRACE_LISTEN forgets to unlock ->siglockOleg Nesterov
If PTRACE_LISTEN fails after lock_task_sighand() it doesn't drop ->siglock. Reported-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-09-23cpu_pm: call notifiers during suspendColin Cross
Implements syscore_ops in cpu_pm to call the cpu and cpu cluster notifiers during suspend and resume, allowing drivers receiving the notifications to avoid implementing syscore_ops. Signed-off-by: Colin Cross <ccross@android.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Reviewed-by: Kevin Hilman <khilman@ti.com> Tested-and-Acked-by: Shawn Guo <shawn.guo@linaro.org> Tested-by: Vishwanath BS <vishwanath.bs@ti.com>
2011-09-23cpu_pm: Add cpu power management notifiersColin Cross
During some CPU power modes entered during idle, hotplug and suspend, peripherals located in the CPU power domain, such as the GIC, localtimers, and VFP, may be powered down. Add a notifier chain that allows drivers for those peripherals to be notified before and after they may be reset. Notified drivers can include VFP co-processor, interrupt controller and it's PM extensions, local CPU timers context save/restore which shouldn't be interrupted. Hence CPU PM event APIs must be called with interrupts disabled. Signed-off-by: Colin Cross <ccross@android.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Reviewed-by: Kevin Hilman <khilman@ti.com> Tested-and-Acked-by: Shawn Guo <shawn.guo@linaro.org> Tested-by: Kevin Hilman <khilman@ti.com> Tested-by: Vishwanath BS <vishwanath.bs@ti.com>
2011-09-22tracing: Fix preemptirqsoff tracer to not stop at preempt offSteven Rostedt
If irqs are disabled when preemption count reaches zero, the preemptirqsoff tracer should not flag that as the end. When interrupts are enabled and preemption count is not zero the preemptirqsoff correctly continues its tracing. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-09-21time: Change jiffies_to_clock_t() argument type to unsigned longhank
The parameter's origin type is long. On an i386 architecture, it can easily be larger than 0x80000000, causing this function to convert it to a sign-extended u64 type. Change the type to unsigned long so we get the correct result. Signed-off-by: hank <pyu@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: <stable@kernel.org> [ build fix ] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-21Merge branch 'kprobes-test' of git://git.yxit.co.uk/linux into devel-stableRussell King
2011-09-20irq: Fix check for already initialized irq_domain in irq_domain_addRob Herring
The sanity check in irq_domain_add() tests desc->irq_data != NULL or irq_data->domain != NULL. This prevents adding an irq_domain to a irq descriptor when irq_data exists, which true when the irq descriptor exists. This went unnoticed so far as the simple domain code did not enter this code path because domain->nr_irqs is always 0 for the simple domains. Split the check for irq_data == NULL out and have a separate warning for it. [ tglx: Made the check for irq_data == NULL separate ] Signed-off-by: Rob Herring <rob.herring@calxeda.com> Cc: Grant Likely <grant.likely@secretlab.ca> Cc: marc.zyngier@arm.com Cc: thomas.abraham@linaro.org Cc: jamie@jamieiles.com Cc: b-cousson@ti.com Cc: shawn.guo@linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: devicetree-discuss@lists.ozlabs.org Link: http://lkml.kernel.org/r/1316017900-19918-3-git-send-email-robherring2@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-09-20sched: Allow SD_NODES_PER_DOMAIN to be overriddenAnton Blanchard
We want to override the default value of SD_NODES_PER_DOMAIN on ppc64, so move it into linux/topology.h. Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-09-20Merge branch 'irq-fixes-for-linus' of git://tesla.tglx.de/git/linux-2.6-tipLinus Torvalds
* 'irq-fixes-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip: x86, iommu: Mark DMAR IRQ as non-threaded genirq: Make irq_shutdown() symmetric vs. irq_startup again
2011-09-20Make taskstats round statistics down to nearest 1k bytes/eventsLinus Torvalds
Even with just the interface limited to admin, there really is little to reason to give byte-per-byte counts for taskstats. So round it down to something less intrusive. Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-09-20Make TASKSTATS require root accessLinus Torvalds
Ok, this isn't optimal, since it means that 'iotop' needs admin capabilities, and we may have to work on this some more. But at the same time it is very much not acceptable to let anybody just read anybody elses IO statistics quite at this level. Use of the GENL_ADMIN_PERM suggested by Johannes Berg as an alternative to checking the capabilities by hand. Reported-by: Vasiliy Kulikov <segoon@openwall.com> Cc: Johannes Berg <johannes.berg@intel.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-09-19tracing: Add a counter clock for those that do not trust clocksSteven Rostedt
When debugging tight race conditions, it can be helpful to have a synchronized tracing method. Although in most cases the global clock provides this functionality, if timings is not the issue, it is more comforting to know that the order of events really happened in a precise order. Instead of using a clock, add a "counter" that is simply an incrementing atomic 64bit counter that orders the events as they are perceived to happen. The trace_clock_counter() is added from the attempt by Peter Zijlstra trying to convert the trace_clock_global() to it. I took Peter's counter code and made trace_clock_counter() instead, and added it to the choice of clocks. Just echo counter > /debug/tracing/trace_clock to activate it. Requested-by: Thomas Gleixner <tglx@linutronix.de> Requested-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-By: Valdis Kletnieks <valdis.kletnieks@vt.edu> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-09-18watchdog: Drop FIFO policy in exit pathThomas Gleixner
When the watchdog thread exits it runs through the exit path with FIFO priority. There is no point in doing so. Switch back to SCHED_NORMAL before exiting. Cc: Don Zickus <dzickus@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1109121337461.2723@ionos Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-18Merge branch 'linus' into sched/coreIngo Molnar
Merge reason: We are queueing up a dependent patch. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-18lockdep: Comment all warningsPeter Zijlstra
Andrew requested I comment all the lockdep WARN()s to help other people figure out wth is wrong.. Requested-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1315301493.3191.9.camel@twins Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-18sched/rt: Migrate equal priority tasks to available CPUsShawn Bohrer
Commit 43fa5460fe60dea5c610490a1d263415419c60f6 ("sched: Try not to migrate higher priority RT tasks") also introduced a change in behavior which keeps RT tasks on the same CPU if there is an equal priority RT task currently running even if there are empty CPUs available. This can cause unnecessary wakeup latencies, and can prevent the scheduler from balancing all RT tasks across available CPUs. This change causes an RT task to search for a new CPU if an equal priority RT task is already running on wakeup. Lower priority tasks will still have to wait on higher priority tasks, but the system should still balance out because there is always the possibility that if there are both a high and low priority RT tasks on a given CPU that the high priority task could wakeup while the low priority task is running and force it to search for a better runqueue. Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Tested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: stable@kernel.org # 37+ Link: http://lkml.kernel.org/r/1315837684-18733-1-git-send-email-sbohrer@rgmadvisors.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-15Merge branch 'master' into for-nextJiri Kosina
Fast-forward merge with Linus to be able to merge patches based on more recent version of the tree.
2011-09-15futex: Fix spelling in a source code commentBart Van Assche
Change a single occurrence of "unlcoked" into "unlocked". Signed-off-by: Bart Van Assche <bvanassche@acm.org> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-09-15futex: uninitialized warning correctionsVitaliy Ivanov
The variables here are really not used uninitialized. kernel/futex.c: In function 'fixup_pi_state_owner.clone.17': kernel/futex.c:1582:6: warning: 'curval' may be used uninitialized in this function kernel/futex.c: In function 'handle_futex_death': kernel/futex.c:2486:6: warning: 'nval' may be used uninitialized in this function kernel/futex.c: In function 'do_futex': kernel/futex.c:863:11: warning: 'curval' may be used uninitialized in this function kernel/futex.c:828:6: note: 'curval' was declared here kernel/futex.c:898:5: warning: 'oldval' may be used uninitialized in this function kernel/futex.c:890:6: note: 'oldval' was declared here Signed-off-by: Vitaliy Ivanov <vitalivanov@gmail.com> Acked-by: Darren Hart <dvhart@linux.intel.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-09-15async: uninitialized warning correctionsVitaliy Ivanov
The variables here are really not used uninitialized. kernel/async.c: In function 'async_synchronize_cookie_domain': kernel/async.c:270:10: warning: 'starttime.tv64' may be used uninitialized in this function kernel/async.c: In function 'async_run_entry_fn': kernel/async.c:122:10: warning: 'calltime.tv64' may be used uninitialized in this function Signed-off-by: Vitaliy Ivanov <vitalivanov@gmail.com> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Signed-off-by: Viresh Kumar <viresh.kumar@st.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-09-15workqueue: lock cwq access in drain_workqueueThomas Tuttle
Take cwq->gcwq->lock to avoid racing between drain_workqueue checking to make sure the workqueues are empty and cwq_dec_nr_in_flight decrementing and then incrementing nr_active when it activates a delayed work. We discovered this when a corner case in one of our drivers resulted in us trying to destroy a workqueue in which the remaining work would always requeue itself again in the same workqueue. We would hit this race condition and trip the BUG_ON on workqueue.c:3080. Signed-off-by: Thomas Tuttle <ttuttle@chromium.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-09-14alarmtimers: Fix error handlingThomas Gleixner
commit 8bc0daf (alarmtimers: Rework RTC device selection using class interface) did not implement required error checks. Add them. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-09-13locking, latencytop: Annotate latency_lock as rawThomas Gleixner
The latency_lock is lock can be taken in the guts of the scheduler code and therefore cannot be preempted on -rt - annotate it. In mainline this change documents the low level nature of the lock - otherwise there's no functional difference. Lockdep and Sparse checking will work as usual. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-13locking, timer_stats: Annotate table_lock as rawThomas Gleixner
The table_lock lock can be taken in atomic context and therefore cannot be preempted on -rt - annotate it. In mainline this change documents the low level nature of the lock - otherwise there's no functional difference. Lockdep and Sparse checking will work as usual. Reported-by: Andreas Sundebo <kernel@sundebo.dk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Andreas Sundebo <kernel@sundebo.dk> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-13locking, semaphores: Annotate inner lock as rawThomas Gleixner
There is no reason to have the spin_lock protecting the semaphore preemptible on -rt. Annotate it as a raw_spinlock. In mainline this change documents the low level nature of the lock - otherwise there's no functional difference. Lockdep and Sparse checking will work as usual. ( On rt this also solves lockdep complaining about the rt_mutex.wait_lock being not initialized. ) Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-13locking, sched: Annotate thread_group_cputimer as rawThomas Gleixner
The thread_group_cputimer lock can be taken in atomic context and therefore cannot be preempted on -rt - annotate it. In mainline this change documents the low level nature of the lock - otherwise there's no functional difference. Lockdep and Sparse checking will work as usual. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-13locking, printk: Annotate logbuf_lock as rawThomas Gleixner
The logbuf_lock lock can be taken in atomic context and therefore cannot be preempted on -rt - annotate it. In mainline this change documents the low level nature of the lock - otherwise there's no functional difference. Lockdep and Sparse checking will work as usual. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [ merged and fixed it ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-13locking, tracing: Annotate tracing locks as rawThomas Gleixner
The tracing locks can be taken in atomic context and therefore cannot be preempted on -rt - annotate it. In mainline this change documents the low level nature of the lock - otherwise there's no functional difference. Lockdep and Sparse checking will work as usual. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-13locking, sched, cgroups: Annotate release_list_lock as rawThomas Gleixner
The release_list_lock can be taken in atomic context and therefore cannot be preempted on -rt - annotate it. In mainline this change documents the low level nature of the lock - otherwise there's no functional difference. Lockdep and Sparse checking will work as usual. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-13locking, kprobes: Annotate the hash locks and kretprobe.lock as rawThomas Gleixner
The kprobe locks can be taken in atomic context and therefore cannot be preempted on -rt - annotate it. In mainline this change documents the low level nature of the lock - otherwise there's no functional difference. Lockdep and Sparse checking will work as usual. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-13clocksource: Make watchdog reset locklessThomas Gleixner
KGDB needs to trylock watchdog_lock when trying to reset the clocksource watchdog after the system has been stopped to avoid a potential deadlock. When the trylock fails TSC usually becomes unstable. We can be more clever by using an atomic counter and checking it in the clocksource_watchdog callback. We restart the watchdog whenever the counter is > 0 and only decrement the counter when we ran through a full update cycle. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <johnstul@us.ibm.com> Acked-by: Jason Wessel <jason.wessel@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1109121326280.2723@ionos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-09-12rtmutex: Cleanup the debug codeThomas Gleixner
Use the existing lock debugging macros. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-09-12genirq: Add IRQCHIP_SKIP_SET_WAKE flagSantosh Shilimkar
Some irq chips need the irq_set_wake() functionality, but do not require a irq_set_wake() callback. Instead of forcing an empty callback to be implemented add a flag which notes this fact. Check for the flag in set_irq_wake_real() and return success when set. Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Cc: Thomas Gleixner <tglx@linutronix.de>
2011-09-12genirq: Make irq_shutdown() symmetric vs. irq_startup againGeert Uytterhoeven
If an irq_chip provides .irq_shutdown(), but neither of .irq_disable() or .irq_mask(), free_irq() crashes when jumping to NULL. Fix this by only trying .irq_disable() and .irq_mask() if there's no .irq_shutdown() provided. This revives the symmetry with irq_startup(), which tries .irq_startup(), .irq_enable(), and irq_unmask(), and makes it consistent with the comment for irq_chip.irq_shutdown() in <linux/irq.h>, which says: * @irq_shutdown: shut down the interrupt (defaults to ->disable if NULL) This is also how __free_irq() behaved before the big overhaul, cfr. e.g. 3b56f0585fd4c02d047dc406668cb40159b2d340 ("genirq: Remove bogus conditional"), where the core interrupt code always overrode .irq_shutdown() to .irq_disable() if .irq_shutdown() was NULL. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: linux-m68k@lists.linux-m68k.org Link: http://lkml.kernel.org/r/1315742394-16036-2-git-send-email-geert@linux-m68k.org Cc: stable@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-09-08posix-cpu-timers: Cure SMP accounting odditiesPeter Zijlstra
David reported: Attached below is a watered-down version of rt/tst-cpuclock2.c from GLIBC. Just build it with "gcc -o test test.c -lpthread -lrt" or similar. Run it several times, and you will see cases where the main thread will measure a process clock difference before and after the nanosleep which is smaller than the cpu-burner thread's individual thread clock difference. This doesn't make any sense since the cpu-burner thread is part of the top-level process's thread group. I've reproduced this on both x86-64 and sparc64 (using both 32-bit and 64-bit binaries). For example: [davem@boricha build-x86_64-linux]$ ./test process: before(0.001221967) after(0.498624371) diff(497402404) thread: before(0.000081692) after(0.498316431) diff(498234739) self: before(0.001223521) after(0.001240219) diff(16698) [davem@boricha build-x86_64-linux]$ The diff of 'process' should always be >= the diff of 'thread'. I make sure to wrap the 'thread' clock measurements the most tightly around the nanosleep() call, and that the 'process' clock measurements are the outer-most ones. --- #include <unistd.h> #include <stdio.h> #include <stdlib.h> #include <time.h> #include <fcntl.h> #include <string.h> #include <errno.h> #include <pthread.h> static pthread_barrier_t barrier; static void *chew_cpu(void *arg) { pthread_barrier_wait(&barrier); while (1) __asm__ __volatile__("" : : : "memory"); return NULL; } int main(void) { clockid_t process_clock, my_thread_clock, th_clock; struct timespec process_before, process_after; struct timespec me_before, me_after; struct timespec th_before, th_after; struct timespec sleeptime; unsigned long diff; pthread_t th; int err; err = clock_getcpuclockid(0, &process_clock); if (err) return 1; err = pthread_getcpuclockid(pthread_self(), &my_thread_clock); if (err) return 1; pthread_barrier_init(&barrier, NULL, 2); err = pthread_create(&th, NULL, chew_cpu, NULL); if (err) return 1; err = pthread_getcpuclockid(th, &th_clock); if (err) return 1; pthread_barrier_wait(&barrier); err = clock_gettime(process_clock, &process_before); if (err) return 1; err = clock_gettime(my_thread_clock, &me_before); if (err) return 1; err = clock_gettime(th_clock, &th_before); if (err) return 1; sleeptime.tv_sec = 0; sleeptime.tv_nsec = 500000000; nanosleep(&sleeptime, NULL); err = clock_gettime(th_clock, &th_after); if (err) return 1; err = clock_gettime(my_thread_clock, &me_after); if (err) return 1; err = clock_gettime(process_clock, &process_after); if (err) return 1; diff = process_after.tv_nsec - process_before.tv_nsec; printf("process: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n", process_before.tv_sec, process_before.tv_nsec, process_after.tv_sec, process_after.tv_nsec, diff); diff = th_after.tv_nsec - th_before.tv_nsec; printf("thread: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n", th_before.tv_sec, th_before.tv_nsec, th_after.tv_sec, th_after.tv_nsec, diff); diff = me_after.tv_nsec - me_before.tv_nsec; printf("self: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n", me_before.tv_sec, me_before.tv_nsec, me_after.tv_sec, me_after.tv_nsec, diff); return 0; } This is due to us using p->se.sum_exec_runtime in thread_group_cputime() where we iterate the thread group and sum all data. This does not take time since the last schedule operation (tick or otherwise) into account. We can cure this by using task_sched_runtime() at the cost of having to take locks. This also means we can (and must) do away with thread_group_sched_runtime() since the modified thread_group_cputime() is now more accurate and would deadlock when called from thread_group_sched_runtime(). Reported-by: David Miller <davem@davemloft.net> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1314874459.7945.22.camel@twins Cc: stable@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-09-08clockevents: Add direct ktime programming functionMartin Schwidefsky
There is at least one architecture (s390) with a sane clockevent device that can be programmed with the equivalent of a ktime. No need to create a delta against the current time, the ktime can be used directly. A new clock device function 'set_next_ktime' is introduced that is called with the unmodified ktime for the timer if the clock event device has the CLOCK_EVT_FEAT_KTIME bit set. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: john stultz <johnstul@us.ibm.com> Link: http://lkml.kernel.org/r/20110823133142.815350967@de.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-09-08clockevents: Make minimum delay adjustments configurableMartin Schwidefsky
The automatic increase of the min_delta_ns of a clockevents device should be done in the clockevents code as the minimum delay is an attribute of the clockevents device. In addition not all architectures want the automatic adjustment, on a massively virtualized system it can happen that the programming of a clock event fails several times in a row because the virtual cpu has been rescheduled quickly enough. In that case the minimum delay will erroneously be increased with no way back. The new config symbol GENERIC_CLOCKEVENTS_MIN_ADJUST is used to enable the automatic adjustment. The config option is selected only for x86. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: john stultz <johnstul@us.ibm.com> Link: http://lkml.kernel.org/r/20110823133142.494157493@de.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-09-08nohz: Remove "Switched to NOHz mode" debugging messagesHeiko Carstens
When performing cpu hotplug tests the kernel printk log buffer gets flooded with pointless "Switched to NOHz mode..." messages. Especially when afterwards analyzing a dump this might have removed more interesting stuff out of the buffer. Assuming that switching to NOHz mode simply works just remove the printk. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Link: http://lkml.kernel.org/r/20110823112046.GB2540@osiris.boeblingen.de.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-09-08nohz: Make idle/iowait counter update conditionalMichal Hocko
get_cpu_{idle,iowait}_time_us update idle/iowait counters unconditionally if the given CPU is in the idle loop. This doesn't work well outside of CPU governors which are singletons so nobody (except for IRQ) can race with them. We will need to use both functions from /proc/stat handler to properly handle nohz idle/iowait times. Make the update depend on a non NULL last_update_time argument. Signed-off-by: Michal Hocko <mhocko@suse.cz> Cc: Dave Jones <davej@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Alexey Dobriyan <adobriyan@gmail.com> Link: http://lkml.kernel.org/r/11f23179472635ce52e78921d47a20216b872f23.1314172057.git.mhocko@suse.cz Signed-off-by: Thomas Gleixner <tglx@linutronix.de>