summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2015-02-13stomp-machine-raw-lock.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13stop_machine: convert stop_machine_run() to PREEMPT_RTIngo Molnar
Instead of playing with non-preemption, introduce explicit startup serialization. This is more robust and cleaner as well. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13sched/workqueue: Only wake up idle workers if not blocked on sleeping spin lockSteven Rostedt
In -rt, most spin_locks() turn into mutexes. One of these spin_lock conversions is performed on the workqueue gcwq->lock. When the idle worker is worken, the first thing it will do is grab that same lock and it too will block, possibly jumping into the same code, but because nr_running would already be decremented it prevents an infinite loop. But this is still a waste of CPU cycles, and it doesn't follow the method of mainline, as new workers should only be woken when a worker thread is truly going to sleep, and not just blocked on a spin_lock(). Check the saved_state too before waking up new workers. Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-02-13sched: ttwu: Return success when only changing the saved_state valueThomas Gleixner
When a task blocks on a rt lock, it saves the current state in p->saved_state, so a lock related wake up will not destroy the original state. When a real wakeup happens, while the task is running due to a lock wakeup already, we update p->saved_state to TASK_RUNNING, but we do not return success, which might cause another wakeup in the waitqueue code and the task remains in the waitqueue list. Return success in that case as well. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org
2015-02-13sched-disable-ttwu-queue.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13cond-resched-softirq-fix.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13sched-cond-resched.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13sched-might-sleep-do-not-account-rcu-depth.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13sched-rt-mutex-wakeup.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13sched-mmdrop-delayed.patchThomas Gleixner
Needs thread context (pgd_lock) -> ifdeffed. workqueues wont work with RT Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13sched-limit-nr-migrate.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13sched-delay-put-task.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13posix-timers: Avoid wakeups when no timers are activeThomas Gleixner
Waking the thread even when no timers are scheduled is useless. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13posix-timers: Shorten posix_cpu_timers/<CPU> kernel thread namesArnaldo Carvalho de Melo
Shorten the softirq kernel thread names because they always overflow the limited comm length, appearing as "posix_cpu_timer" CPU# times. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13posix-timers: thread posix-cpu-timers on -rtJohn Stultz
posix-cpu-timer code takes non -rt safe locks in hard irq context. Move it to a thread. [ 3.0 fixes from Peter Zijlstra <peterz@infradead.org> ] Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13hrtimer: Move schedule_work call to helper threadYang Shi
When run ltp leapsec_timer test, the following call trace is caught: BUG: sleeping function called from invalid context at kernel/rtmutex.c:659 in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1 Preemption disabled at:[<ffffffff810857f3>] cpu_startup_entry+0x133/0x310 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.10.10-rt3 #2 Hardware name: Intel Corporation Calpella platform/MATXM-CORE-411-B, BIOS 4.6.3 08/18/2010 ffffffff81c2f800 ffff880076843e40 ffffffff8169918d ffff880076843e58 ffffffff8106db31 ffff88007684b4a0 ffff880076843e70 ffffffff8169d9c0 ffff88007684b4a0 ffff880076843eb0 ffffffff81059da1 0000001876851200 Call Trace: <IRQ> [<ffffffff8169918d>] dump_stack+0x19/0x1b [<ffffffff8106db31>] __might_sleep+0xf1/0x170 [<ffffffff8169d9c0>] rt_spin_lock+0x20/0x50 [<ffffffff81059da1>] queue_work_on+0x61/0x100 [<ffffffff81065aa1>] clock_was_set_delayed+0x21/0x30 [<ffffffff810883be>] do_timer+0x40e/0x660 [<ffffffff8108f487>] tick_do_update_jiffies64+0xf7/0x140 [<ffffffff8108fe42>] tick_check_idle+0x92/0xc0 [<ffffffff81044327>] irq_enter+0x57/0x70 [<ffffffff816a040e>] smp_apic_timer_interrupt+0x3e/0x9b [<ffffffff8169f80a>] apic_timer_interrupt+0x6a/0x70 <EOI> [<ffffffff8155ea1c>] ? cpuidle_enter_state+0x4c/0xc0 [<ffffffff8155eb68>] cpuidle_idle_call+0xd8/0x2d0 [<ffffffff8100b59e>] arch_cpu_idle+0xe/0x30 [<ffffffff8108585e>] cpu_startup_entry+0x19e/0x310 [<ffffffff8168efa2>] start_secondary+0x1ad/0x1b0 The clock_was_set_delayed is called in hard IRQ handler (timer interrupt), which calls schedule_work. Under PREEMPT_RT_FULL, schedule_work calls spinlocks which could sleep, so it's not safe to call schedule_work in interrupt context. Reference upstream commit b68d61c705ef02384c0538b8d9374545097899ca (rt,ntp: Move call to schedule_delayed_work() to helper thread) from git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git, which makes a similar change. add a helper thread which does the call to schedule_work and wake up that thread instead of calling schedule_work directly. Cc: stable-rt@vger.kernel.org Signed-off-by: Yang Shi <yang.shi@windriver.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-02-13hrtimer: Raise softirq if hrtimer irq stalledWatanabe
When the hrtimer stall detection hits the softirq is not raised. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org
2015-02-13hrtimer: fixup hrtimer callback changes for preempt-rtThomas Gleixner
In preempt-rt we can not call the callbacks which take sleeping locks from the timer interrupt context. Bring back the softirq split for now, until we fixed the signal delivery problem for real. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2015-02-13hrtimers: prepare full preemptionIngo Molnar
Make cancellation of a running callback in softirq context safe against preemption. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13timers: Avoid the switch timers base set to NULL trick on RTThomas Gleixner
On RT that code is preemptible, so we cannot assign NULL to timers base as a preempter would spin forever in lock_timer_base(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13timer: delay waking softirqs from the jiffy tickPeter Zijlstra
People were complaining about broken balancing with the recent -rt series. A look at /proc/sched_debug yielded: cpu#0, 2393.874 MHz .nr_running : 0 .load : 0 .cpu_load[0] : 177522 .cpu_load[1] : 177522 .cpu_load[2] : 177522 .cpu_load[3] : 177522 .cpu_load[4] : 177522 cpu#1, 2393.874 MHz .nr_running : 4 .load : 4096 .cpu_load[0] : 181618 .cpu_load[1] : 180850 .cpu_load[2] : 180274 .cpu_load[3] : 179938 .cpu_load[4] : 179758 Which indicated the cpu_load computation was hosed, the 177522 value indicates that there is one RT task runnable. Initially I thought the old problem of calculating the cpu_load from a softirq had re-surfaced, however looking at the code shows its being done from scheduler_tick(). [ we really should fix this RT/cfs interaction some day... ] A few trace_printk()s later: sirq-timer/1-19 [001] 174.289744: 19: 50:S ==> [001] 0:140:R <idle> <idle>-0 [001] 174.290724: enqueue_task_rt: adding task: 19/sirq-timer/1 with load: 177522 <idle>-0 [001] 174.290725: 0:140:R + [001] 19: 50:S sirq-timer/1 <idle>-0 [001] 174.290730: scheduler_tick: current load: 177522 <idle>-0 [001] 174.290732: scheduler_tick: current: 0/swapper <idle>-0 [001] 174.290736: 0:140:R ==> [001] 19: 50:R sirq-timer/1 sirq-timer/1-19 [001] 174.290741: dequeue_task_rt: removing task: 19/sirq-timer/1 with load: 177522 sirq-timer/1-19 [001] 174.290743: 19: 50:S ==> [001] 0:140:R <idle> We see that we always raise the timer softirq before doing the load calculation. Avoid this by re-ordering the scheduler_tick() call in update_process_times() to occur before we deal with timers. This lowers the load back to sanity and restores regular load-balancing behaviour. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13timers: preempt-rt supportIngo Molnar
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13timers: prepare for full preemption improveZhao Hongjiang
wake_up should do nothing on the nort, so we should use wakeup_timer_waiters, also fix a spell mistake. Cc: stable-rt@vger.kernel.org Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com> [bigeasy: s/CONFIG_PREEMPT_RT_BASE/CONFIG_PREEMPT_RT_FULL/] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-02-13timers: prepare for full preemptionIngo Molnar
When softirqs can be preempted we need to make sure that cancelling the timer from the active thread can not deadlock vs. a running timer callback. Add a waitqueue to resolve that. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13relay: fix timer madnessIngo Molnar
remove timer calls (!!!) from deep within the tracing infrastructure. This was totally bogus code that can cause lockups and worse. Poll the buffer every 2 jiffies for now. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13panic: skip get_random_bytes for RT_FULL in init_oops_idThomas Gleixner
2015-02-13genirq: do not invoke the affinity callback via a workqueueSebastian Andrzej Siewior
Joe Korty reported, that __irq_set_affinity_locked() schedules a workqueue while holding a rawlock which results in a might_sleep() warning. This patch moves the invokation into a process context so that we only wakeup() a process while holding the lock. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-02-13genirq-force-threading.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13genirq: disable irqpoll on -rtIngo Molnar
Creates long latencies for no value Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13signal-fix-up-rcu-wreckage.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13core: Do not disable interrupts on RT in res_counter.cIngo Molnar
Frederic Weisbecker reported this warning: [ 45.228562] BUG: sleeping function called from invalid context at kernel/rtmutex.c:683 [ 45.228571] in_atomic(): 0, irqs_disabled(): 1, pid: 4290, name: ntpdate [ 45.228576] INFO: lockdep is turned off. [ 45.228580] irq event stamp: 0 [ 45.228583] hardirqs last enabled at (0): [<(null)>] (null) [ 45.228589] hardirqs last disabled at (0): [<ffffffff8025449d>] copy_process+0x68d/0x1500 [ 45.228602] softirqs last enabled at (0): [<ffffffff8025449d>] copy_process+0x68d/0x1500 [ 45.228609] softirqs last disabled at (0): [<(null)>] (null) [ 45.228617] Pid: 4290, comm: ntpdate Tainted: G W 2.6.29-rc4-rt1-tip #1 [ 45.228622] Call Trace: [ 45.228632] [<ffffffff8027dfb0>] ? print_irqtrace_events+0xd0/0xe0 [ 45.228639] [<ffffffff8024cd73>] __might_sleep+0x113/0x130 [ 45.228646] [<ffffffff8077c811>] rt_spin_lock+0xa1/0xb0 [ 45.228653] [<ffffffff80296a3d>] res_counter_charge+0x5d/0x130 [ 45.228660] [<ffffffff802fb67f>] __mem_cgroup_try_charge+0x7f/0x180 [ 45.228667] [<ffffffff802fc407>] mem_cgroup_charge_common+0x57/0x90 [ 45.228674] [<ffffffff80212096>] ? ftrace_call+0x5/0x2b [ 45.228680] [<ffffffff802fc49d>] mem_cgroup_newpage_charge+0x5d/0x60 [ 45.228688] [<ffffffff802d94ce>] __do_fault+0x29e/0x4c0 [ 45.228694] [<ffffffff8077c843>] ? rt_spin_unlock+0x23/0x80 [ 45.228700] [<ffffffff802db8b5>] handle_mm_fault+0x205/0x890 [ 45.228707] [<ffffffff80212096>] ? ftrace_call+0x5/0x2b [ 45.228714] [<ffffffff8023495e>] do_page_fault+0x11e/0x2a0 [ 45.228720] [<ffffffff8077e5a5>] page_fault+0x25/0x30 [ 45.228727] [<ffffffff8043e1ed>] ? __clear_user+0x3d/0x70 [ 45.228733] [<ffffffff8043e1d1>] ? __clear_user+0x21/0x70 The reason is the raw IRQ flag use of kernel/res_counter.c. The irq flags tricks there seem a bit pointless: it cannot protect the c->parent linkage because local_irq_save() is only per CPU. So replace it with _nort(). This code needs a second look. Reported-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13core: Do not disable interrupts on RT in kernel/users.cThomas Gleixner
Use the local_irq_*_nort variants to reduce latencies in RT. The code is serialized by the locks. No need to disable interrupts. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13kconfig-preempt-rt-full.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13rt-preempt-base-config.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13printk: 'force_early_printk' boot param to help with debuggingPeter Zijlstra
Gives me an option to screw printk and actually see what the machine says. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1314967289.1301.11.camel@twins Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/n/tip-ykb97nsfmobq44xketrxs977@git.kernel.org
2015-02-13printk-kill.patchIngo Molnar
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13latency-hist.patchCarsten Emde
This patch provides a recording mechanism to store data of potential sources of system latencies. The recordings separately determine the latency caused by a delayed timer expiration, by a delayed wakeup of the related user space program and by the sum of both. The histograms can be enabled and reset individually. The data are accessible via the debug filesystem. For details please consult Documentation/trace/histograms.txt. Signed-off-by: Carsten Emde <C.Emde@osadl.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13softirq-split-out-code.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13suspend: Prevent might sleep splatsThomas Gleixner
timekeeping suspend/resume calls read_persistant_clock() which takes rtc_lock. That results in might sleep warnings because at that point we run with interrupts disabled. We cannot convert rtc_lock to a raw spinlock as that would trigger other might sleep warnings. As a temporary workaround we disable the might sleep warnings by setting system_state to SYSTEM_SUSPEND before calling sysdev_suspend() and restoring it to SYSTEM_RUNNING afer sysdev_resume(). Needs to be revisited. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13mm: pagefault_disabled()Peter Zijlstra
Wrap the test for pagefault_disabled() into a helper, this allows us to remove the need for current->pagefault_disabled on !-rt kernels. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-3yy517m8zsi9fpsf14xfaqkw@git.kernel.org
2015-02-13mm: Prepare decoupling the page fault disabling logicIngo Molnar
Add a pagefault_disabled variable to task_struct to allow decoupling the pagefault-disabled logic from the preempt count. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13signal/x86: Delay calling signals in atomicOleg Nesterov
On x86_64 we must disable preemption before we enable interrupts for stack faults, int3 and debugging, because the current task is using a per CPU debug stack defined by the IST. If we schedule out, another task can come in and use the same stack and cause the stack to be corrupted and crash the kernel on return. When CONFIG_PREEMPT_RT_FULL is enabled, spin_locks become mutexes, and one of these is the spin lock used in signal handling. Some of the debug code (int3) causes do_trap() to send a signal. This function calls a spin lock that has been converted to a mutex and has the possibility to sleep. If this happens, the above issues with the corrupted stack is possible. Instead of calling the signal right away, for PREEMPT_RT and x86_64, the signal information is stored on the stacks task_struct and TIF_NOTIFY_RESUME is set. Then on exit of the trap, the signal resume code will send the signal when preemption is enabled. [ rostedt: Switched from #ifdef CONFIG_PREEMPT_RT_FULL to ARCH_RT_DELAYS_SIGNAL_SEND and added comments to the code. ] Cc: stable-rt@vger.kernel.org Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13signals: Allow rt tasks to cache one sigqueue structThomas Gleixner
To avoid allocation allow rt tasks to cache one sigqueue struct in task struct. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13posix-timers: Prevent broadcast signalsThomas Gleixner
Posix timers should not send broadcast signals and kernel only signals. Prevent it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13ptrace: fix ptrace vs tasklist_lock raceSebastian Andrzej Siewior
As explained by Alexander Fyodorov <halcy@yandex.ru>: |read_lock(&tasklist_lock) in ptrace_stop() is converted to mutex on RT kernel, |and it can remove __TASK_TRACED from task->state (by moving it to |task->saved_state). If parent does wait() on child followed by a sys_ptrace |call, the following race can happen: | |- child sets __TASK_TRACED in ptrace_stop() |- parent does wait() which eventually calls wait_task_stopped() and returns | child's pid |- child blocks on read_lock(&tasklist_lock) in ptrace_stop() and moves | __TASK_TRACED flag to saved_state |- parent calls sys_ptrace, which calls ptrace_check_attach() and wait_task_inactive() The patch is based on his initial patch where an additional check is added in case the __TASK_TRACED moved to ->saved_state. The pi_lock is taken in case the caller is interrupted between looking into ->state and ->saved_state. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-02-13signal-revert-ptrace-preempt-magic.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13tracing: Account for preempt off in preempt_schedule()Steven Rostedt
The preempt_schedule() uses the preempt_disable_notrace() version because it can cause infinite recursion by the function tracer as the function tracer uses preempt_enable_notrace() which may call back into the preempt_schedule() code as the NEED_RESCHED is still set and the PREEMPT_ACTIVE has not been set yet. See commit: d1f74e20b5b064a130cd0743a256c2d3cfe84010 that made this change. The preemptoff and preemptirqsoff latency tracers require the first and last preempt count modifiers to enable tracing. But this skips the checks. Since we can not convert them back to the non notrace version, we can use the idle() hooks for the latency tracers here. That is, the start/stop_critical_timings() works well to manually start and stop the latency tracer for preempt off timings. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Clark Williams <williams@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13vtime-split-lock-and-seqcount.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13timekeeping-split-jiffies-lock.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13sched: Consider pi boosting in setschedulerThomas Gleixner
If a PI boosted task policy/priority is modified by a setscheduler() call we unconditionally dequeue and requeue the task if it is on the runqueue even if the new priority is lower than the current effective boosted priority. This can result in undesired reordering of the priority bucket list. If the new priority is less or equal than the current effective we just store the new parameters in the task struct and leave the scheduler class and the runqueue untouched. This is handled when the task deboosts itself. Only if the new priority is higher than the effective boosted priority we apply the change immediately. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Cc: stable-rt@vger.kernel.org