summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2014-05-14timers: prepare for full preemptionIngo Molnar
When softirqs can be preempted we need to make sure that cancelling the timer from the active thread can not deadlock vs. a running timer callback. Add a waitqueue to resolve that. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-14relay: fix timer madnessIngo Molnar
remove timer calls (!!!) from deep within the tracing infrastructure. This was totally bogus code that can cause lockups and worse. Poll the buffer every 2 jiffies for now. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-14panic: skip get_random_bytes for RT_FULL in init_oops_idThomas Gleixner
2014-05-14genirq: do not invoke the affinity callback via a workqueueSebastian Andrzej Siewior
Joe Korty reported, that __irq_set_affinity_locked() schedules a workqueue while holding a rawlock which results in a might_sleep() warning. This patch moves the invokation into a process context so that we only wakeup() a process while holding the lock. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2014-05-14genirq-force-threading.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-14genirq: disable irqpoll on -rtIngo Molnar
Creates long latencies for no value Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-14signal-fix-up-rcu-wreckage.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-14core: Do not disable interrupts on RT in res_counter.cIngo Molnar
Frederic Weisbecker reported this warning: [ 45.228562] BUG: sleeping function called from invalid context at kernel/rtmutex.c:683 [ 45.228571] in_atomic(): 0, irqs_disabled(): 1, pid: 4290, name: ntpdate [ 45.228576] INFO: lockdep is turned off. [ 45.228580] irq event stamp: 0 [ 45.228583] hardirqs last enabled at (0): [<(null)>] (null) [ 45.228589] hardirqs last disabled at (0): [<ffffffff8025449d>] copy_process+0x68d/0x1500 [ 45.228602] softirqs last enabled at (0): [<ffffffff8025449d>] copy_process+0x68d/0x1500 [ 45.228609] softirqs last disabled at (0): [<(null)>] (null) [ 45.228617] Pid: 4290, comm: ntpdate Tainted: G W 2.6.29-rc4-rt1-tip #1 [ 45.228622] Call Trace: [ 45.228632] [<ffffffff8027dfb0>] ? print_irqtrace_events+0xd0/0xe0 [ 45.228639] [<ffffffff8024cd73>] __might_sleep+0x113/0x130 [ 45.228646] [<ffffffff8077c811>] rt_spin_lock+0xa1/0xb0 [ 45.228653] [<ffffffff80296a3d>] res_counter_charge+0x5d/0x130 [ 45.228660] [<ffffffff802fb67f>] __mem_cgroup_try_charge+0x7f/0x180 [ 45.228667] [<ffffffff802fc407>] mem_cgroup_charge_common+0x57/0x90 [ 45.228674] [<ffffffff80212096>] ? ftrace_call+0x5/0x2b [ 45.228680] [<ffffffff802fc49d>] mem_cgroup_newpage_charge+0x5d/0x60 [ 45.228688] [<ffffffff802d94ce>] __do_fault+0x29e/0x4c0 [ 45.228694] [<ffffffff8077c843>] ? rt_spin_unlock+0x23/0x80 [ 45.228700] [<ffffffff802db8b5>] handle_mm_fault+0x205/0x890 [ 45.228707] [<ffffffff80212096>] ? ftrace_call+0x5/0x2b [ 45.228714] [<ffffffff8023495e>] do_page_fault+0x11e/0x2a0 [ 45.228720] [<ffffffff8077e5a5>] page_fault+0x25/0x30 [ 45.228727] [<ffffffff8043e1ed>] ? __clear_user+0x3d/0x70 [ 45.228733] [<ffffffff8043e1d1>] ? __clear_user+0x21/0x70 The reason is the raw IRQ flag use of kernel/res_counter.c. The irq flags tricks there seem a bit pointless: it cannot protect the c->parent linkage because local_irq_save() is only per CPU. So replace it with _nort(). This code needs a second look. Reported-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-14core: Do not disable interrupts on RT in kernel/users.cThomas Gleixner
Use the local_irq_*_nort variants to reduce latencies in RT. The code is serialized by the locks. No need to disable interrupts. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-14kconfig-preempt-rt-full.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-14rt-preempt-base-config.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-14printk: 'force_early_printk' boot param to help with debuggingPeter Zijlstra
Gives me an option to screw printk and actually see what the machine says. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1314967289.1301.11.camel@twins Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/n/tip-ykb97nsfmobq44xketrxs977@git.kernel.org
2014-05-14printk-kill.patchIngo Molnar
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-14latency-hist.patchCarsten Emde
This patch provides a recording mechanism to store data of potential sources of system latencies. The recordings separately determine the latency caused by a delayed timer expiration, by a delayed wakeup of the related user space program and by the sum of both. The histograms can be enabled and reset individually. The data are accessible via the debug filesystem. For details please consult Documentation/trace/histograms.txt. Signed-off-by: Carsten Emde <C.Emde@osadl.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-14softirq-split-out-code.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-14suspend: Prevent might sleep splatsThomas Gleixner
timekeeping suspend/resume calls read_persistant_clock() which takes rtc_lock. That results in might sleep warnings because at that point we run with interrupts disabled. We cannot convert rtc_lock to a raw spinlock as that would trigger other might sleep warnings. As a temporary workaround we disable the might sleep warnings by setting system_state to SYSTEM_SUSPEND before calling sysdev_suspend() and restoring it to SYSTEM_RUNNING afer sysdev_resume(). Needs to be revisited. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-14mm: pagefault_disabled()Peter Zijlstra
Wrap the test for pagefault_disabled() into a helper, this allows us to remove the need for current->pagefault_disabled on !-rt kernels. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-3yy517m8zsi9fpsf14xfaqkw@git.kernel.org
2014-05-14mm: Prepare decoupling the page fault disabling logicIngo Molnar
Add a pagefault_disabled variable to task_struct to allow decoupling the pagefault-disabled logic from the preempt count. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-14signal/x86: Delay calling signals in atomicOleg Nesterov
On x86_64 we must disable preemption before we enable interrupts for stack faults, int3 and debugging, because the current task is using a per CPU debug stack defined by the IST. If we schedule out, another task can come in and use the same stack and cause the stack to be corrupted and crash the kernel on return. When CONFIG_PREEMPT_RT_FULL is enabled, spin_locks become mutexes, and one of these is the spin lock used in signal handling. Some of the debug code (int3) causes do_trap() to send a signal. This function calls a spin lock that has been converted to a mutex and has the possibility to sleep. If this happens, the above issues with the corrupted stack is possible. Instead of calling the signal right away, for PREEMPT_RT and x86_64, the signal information is stored on the stacks task_struct and TIF_NOTIFY_RESUME is set. Then on exit of the trap, the signal resume code will send the signal when preemption is enabled. [ rostedt: Switched from #ifdef CONFIG_PREEMPT_RT_FULL to ARCH_RT_DELAYS_SIGNAL_SEND and added comments to the code. ] Cc: stable-rt@vger.kernel.org Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-14signals: Allow rt tasks to cache one sigqueue structThomas Gleixner
To avoid allocation allow rt tasks to cache one sigqueue struct in task struct. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-14posix-timers: Prevent broadcast signalsThomas Gleixner
Posix timers should not send broadcast signals and kernel only signals. Prevent it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-14ptrace: fix ptrace vs tasklist_lock raceSebastian Andrzej Siewior
As explained by Alexander Fyodorov <halcy@yandex.ru>: |read_lock(&tasklist_lock) in ptrace_stop() is converted to mutex on RT kernel, |and it can remove __TASK_TRACED from task->state (by moving it to |task->saved_state). If parent does wait() on child followed by a sys_ptrace |call, the following race can happen: | |- child sets __TASK_TRACED in ptrace_stop() |- parent does wait() which eventually calls wait_task_stopped() and returns | child's pid |- child blocks on read_lock(&tasklist_lock) in ptrace_stop() and moves | __TASK_TRACED flag to saved_state |- parent calls sys_ptrace, which calls ptrace_check_attach() and wait_task_inactive() The patch is based on his initial patch where an additional check is added in case the __TASK_TRACED moved to ->saved_state. The pi_lock is taken in case the caller is interrupted between looking into ->state and ->saved_state. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2014-05-14signal-revert-ptrace-preempt-magic.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-14tracing: Account for preempt off in preempt_schedule()Steven Rostedt
The preempt_schedule() uses the preempt_disable_notrace() version because it can cause infinite recursion by the function tracer as the function tracer uses preempt_enable_notrace() which may call back into the preempt_schedule() code as the NEED_RESCHED is still set and the PREEMPT_ACTIVE has not been set yet. See commit: d1f74e20b5b064a130cd0743a256c2d3cfe84010 that made this change. The preemptoff and preemptirqsoff latency tracers require the first and last preempt count modifiers to enable tracing. But this skips the checks. Since we can not convert them back to the non notrace version, we can use the idle() hooks for the latency tracers here. That is, the start/stop_critical_timings() works well to manually start and stop the latency tracer for preempt off timings. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Clark Williams <williams@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-14vtime-split-lock-and-seqcount.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-14timekeeping-split-jiffies-lock.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-14sched: Consider pi boosting in setschedulerThomas Gleixner
If a PI boosted task policy/priority is modified by a setscheduler() call we unconditionally dequeue and requeue the task if it is on the runqueue even if the new priority is lower than the current effective boosted priority. This can result in undesired reordering of the priority bucket list. If the new priority is less or equal than the current effective we just store the new parameters in the task struct and leave the scheduler class and the runqueue untouched. This is handled when the task deboosts itself. Only if the new priority is higher than the effective boosted priority we apply the change immediately. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Cc: stable-rt@vger.kernel.org
2014-05-14sched: Queue RT tasks to head when prio dropsThomas Gleixner
The following scenario does not work correctly: Runqueue of CPUx contains two runnable and pinned tasks: T1: SCHED_FIFO, prio 80 T2: SCHED_FIFO, prio 80 T1 is on the cpu and executes the following syscalls (classic priority ceiling scenario): sys_sched_setscheduler(pid(T1), SCHED_FIFO, .prio = 90); ... sys_sched_setscheduler(pid(T1), SCHED_FIFO, .prio = 80); ... Now T1 gets preempted by T3 (SCHED_FIFO, prio 95). After T3 goes back to sleep the scheduler picks T2. Surprise! The same happens w/o actual preemption when T1 is forced into the scheduler due to a sporadic NEED_RESCHED event. The scheduler invokes pick_next_task() which returns T2. So T1 gets preempted and scheduled out. This happens because sched_setscheduler() dequeues T1 from the prio 90 list and then enqueues it on the tail of the prio 80 list behind T2. This violates the POSIX spec and surprises user space which relies on the guarantee that SCHED_FIFO tasks are not scheduled out unless they give the CPU up voluntarily or are preempted by a higher priority task. In the latter case the preempted task must get back on the CPU after the preempting task schedules out again. We fixed a similar issue already in commit 60db48c (sched: Queue a deboosted task to the head of the RT prio queue). The same treatment is necessary for sched_setscheduler(). So enqueue to head of the prio bucket list if the priority of the task is lowered. It might be possible that existing user space relies on the current behaviour, but it can be considered highly unlikely due to the corner case nature of the application scenario. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Cc: stable-rt@vger.kernel.org
2014-05-14sched: Adjust sched_reset_on_fork when nothing else changesThomas Gleixner
If the policy and priority remain unchanged a possible modification of sched_reset_on_fork gets lost in the early exit path. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Cc: stable-rt@vger.kernel.org
2014-05-14sched: Better debug output for might sleepThomas Gleixner
might sleep can tell us where interrupts have been disabled, but we have no idea what disabled preemption. Add some debug infrastructure. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-14sched: Check for idle task in might_sleep()Thomas Gleixner
Idle is not allowed to call sleeping functions ever! Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-14sched: Init idle->on_rq in init_idle()Thomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-14rcu: Don't activate RCU core on NO_HZ_FULL CPUsPaul E. McKenney
Whenever a CPU receives a scheduling-clock interrupt, RCU checks to see if the RCU core needs anything from this CPU. If so, RCU raises RCU_SOFTIRQ to carry out any needed processing. This approach has worked well historically, but it is undesirable on NO_HZ_FULL CPUs. Such CPUs are expected to spend almost all of their time in userspace, so that scheduling-clock interrupts can be disabled while there is only one runnable task on the CPU in question. Unfortunately, raising any softirq has the potential to wake up ksoftirqd, which would provide the second runnable task on that CPU, preventing disabling of scheduling-clock interrupts. What is needed instead is for RCU to leave NO_HZ_FULL CPUs alone, relying on the grace-period kthreads' quiescent-state forcing to do any needed RCU work on behalf of those CPUs. This commit therefore refrains from raising RCU_SOFTIRQ on any NO_HZ_FULL CPUs during any grace periods that have been in effect for less than one second. The one-second limit handles the case where an inappropriate workload is running on a NO_HZ_FULL CPU that features lots of scheduling-clock interrupts, but no idle or userspace time. Reported-by: Mike Galbraith <bitbucket@online.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Mike Galbraith <bitbucket@online.de> Tested-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2014-05-14lockdep: Correctly annotate hardirq context in irq_exit()Peter Zijlstra
There was a reported deadlock on -rt which lockdep didn't report. It turns out that in irq_exit() we tell lockdep that the hardirq context ends and then do all kinds of locking afterwards. To fix it, move trace_hardirq_exit() to the very end of irq_exit(), this ensures all locking in tick_irq_exit() and rcu_irq_exit() are properly recorded as happening from hardirq context. This however leads to the 'fun' little problem of running softirqs while in hardirq context. To cure this make the softirq code a little more complex (in the CONFIG_TRACE_IRQFLAGS case). Due to stack swizzling arch dependent trickery we cannot pass an argument to __do_softirq() to tell it if it was done from hardirq context or not; so use a side-band argument. When we do __do_softirq() from hardirq context, 'atomically' flip to softirq context and back, so that no locking goes without being in either hard- or soft-irq context. I didn't find any new problems in mainline using this patch, but it did show the -rt problem. Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-dgwc5cdksbn0jk09vbmcc9sa@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2014-05-14Reset to 3.12.19Scott Wood
2014-04-10rt,ntp: Move call to schedule_delayed_work() to helper threadSteven Rostedt
The ntp code for notify_cmos_timer() is called from a hard interrupt context. schedule_delayed_work() under PREEMPT_RT_FULL calls spinlocks that have been converted to mutexes, thus calling schedule_delayed_work() from interrupt is not safe. Add a helper thread that does the call to schedule_delayed_work and wake up that thread instead of calling schedule_delayed_work() directly. This is only for CONFIG_PREEMPT_RT_FULL, otherwise the code still calls schedule_delayed_work() directly in irq context. Note: There's a few places in the kernel that do this. Perhaps the RT code should have a dedicated thread that does the checks. Just register a notifier on boot up for your check and wake up the thread when needed. This will be a todo. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-04-10completion: Use simple wait queuesThomas Gleixner
Completions have no long lasting callbacks and therefor do not need the complex waitqueue variant. Use simple waitqueues which reduces the contention on the waitqueue lock. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10rcu-more-swait-conversions.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Merged Steven's static void rcu_nocb_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp) { - swait_wake(&rnp->nocb_gp_wq[rnp->completed & 0x1]); + wake_up_all(&rnp->nocb_gp_wq[rnp->completed & 0x1]); } Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2014-04-10kernel/treercu: use a simple waitqueueSebastian Andrzej Siewior
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2014-04-10simple-wait: rename and export the equivalent of waitqueue_active()Paul Gortmaker
The function "swait_head_has_waiters()" was internalized into wait-simple.c but it parallels the waitqueue_active of normal waitqueue support. Given that there are over 150 waitqueue_active users in drivers/ fs/ kernel/ and the like, lets make it globally visible, and rename it to parallel the waitqueue_active accordingly. We'll need to do this if we expect to expand its usage beyond RT. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2014-04-10wait-simple: Rework for use with completionsThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10wait-simple: Simple waitqueue implementationThomas Gleixner
wait_queue is a swiss army knife and in most of the cases the complexity is not needed. For RT waitqueues are a constant source of trouble as we can't convert the head lock to a raw spinlock due to fancy and long lasting callbacks. Provide a slim version, which allows RT to replace wait queues. This should go mainline as well, as it lowers memory consumption and runtime overhead. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> smp_mb() added by Steven Rostedt to fix a race condition with swait wakeups vs adding items to the list.
2014-04-10sched: Add support for lazy preemptionThomas Gleixner
It has become an obsession to mitigate the determinism vs. throughput loss of RT. Looking at the mainline semantics of preemption points gives a hint why RT sucks throughput wise for ordinary SCHED_OTHER tasks. One major issue is the wakeup of tasks which are right away preempting the waking task while the waking task holds a lock on which the woken task will block right after having preempted the wakee. In mainline this is prevented due to the implicit preemption disable of spin/rw_lock held regions. On RT this is not possible due to the fully preemptible nature of sleeping spinlocks. Though for a SCHED_OTHER task preempting another SCHED_OTHER task this is really not a correctness issue. RT folks are concerned about SCHED_FIFO/RR tasks preemption and not about the purely fairness driven SCHED_OTHER preemption latencies. So I introduced a lazy preemption mechanism which only applies to SCHED_OTHER tasks preempting another SCHED_OTHER task. Aside of the existing preempt_count each tasks sports now a preempt_lazy_count which is manipulated on lock acquiry and release. This is slightly incorrect as for lazyness reasons I coupled this on migrate_disable/enable so some other mechanisms get the same treatment (e.g. get_cpu_light). Now on the scheduler side instead of setting NEED_RESCHED this sets NEED_RESCHED_LAZY in case of a SCHED_OTHER/SCHED_OTHER preemption and therefor allows to exit the waking task the lock held region before the woken task preempts. That also works better for cross CPU wakeups as the other side can stay in the adaptive spinning loop. For RT class preemption there is no change. This simply sets NEED_RESCHED and forgoes the lazy preemption counter. Initial test do not expose any observable latency increasement, but history shows that I've been proven wrong before :) The lazy preemption mode is per default on, but with CONFIG_SCHED_DEBUG enabled it can be disabled via: # echo NO_PREEMPT_LAZY >/sys/kernel/debug/sched_features and reenabled via # echo PREEMPT_LAZY >/sys/kernel/debug/sched_features The test results so far are very machine and workload dependent, but there is a clear trend that it enhances the non RT workload performance. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10rcu: Eliminate softirq processing from rcutreePaul E. McKenney
Running RCU out of softirq is a problem for some workloads that would like to manage RCU core processing independently of other softirq work, for example, setting kthread priority. This commit therefore moves the RCU core work from softirq to a per-CPU/per-flavor SCHED_OTHER kthread named rcuc. The SCHED_OTHER approach avoids the scalability problems that appeared with the earlier attempt to move RCU core processing to from softirq to kthreads. That said, kernels built with RCU_BOOST=y will run the rcuc kthreads at the RCU-boosting priority. Reported-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Mike Galbraith <bitbucket@online.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2014-04-10softirq: make migrate disable/enable conditioned on softirq_nestcnt transitionNicholas Mc Guire
This patch removes the recursive calls to migrate_disable/enable in local_bh_disable/enable the softirq-local-lock.patch introduces local_bh_disable/enable wich decrements/increments the current->softirq_nestcnt and disable/enables migration as well. as softirq_nestcnt (include/linux/sched.h conditioned on CONFIG_PREEMPT_RT_BASE) already is tracking the nesting level of the recursive calls to local_bh_disable/enable (all in kernel/softirq.c) - no need to do it twice. migrate_disable/enable thus can be conditionsed on softirq_nestcnt making a transition from 0-1 to disable migration and 1-0 to re-enable it. No change of functional behavior, this does noticably reduce the observed nesting level of migrate_disable/enable Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2014-04-10softirq: Adapt NOHZ softirq pending check to new RT schemeThomas Gleixner
We can't rely on ksoftirqd anymore and we need to check the tasks which run a particular softirq and if such a task is pi blocked ignore the other pending bits of that task as well. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10API cleanup - use local_lock not __local_lock for softNicholas Mc Guire
trivial API cleanup - kernel/softirq.c was mimiking local_lock. No change of functional behavior Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2014-04-10softirq: Split softirq locksThomas Gleixner
The 3.x RT series removed the split softirq implementation in favour of pushing softirq processing into the context of the thread which raised it. Though this prevents us from handling the various softirqs at different priorities. Now instead of reintroducing the split softirq threads we split the locks which serialize the softirq processing. If a softirq is raised in context of a thread, then the softirq is noted on a per thread field, if the thread is in a bh disabled region. If the softirq is raised from hard interrupt context, then the bit is set in the flag field of ksoftirqd and ksoftirqd is invoked. When a thread leaves a bh disabled region, then it tries to execute the softirqs which have been raised in its own context. It acquires the per softirq / per cpu lock for the softirq and then checks, whether the softirq is still pending in the per cpu local_softirq_pending() field. If yes, it runs the softirq. If no, then some other task executed it already. This allows for zero config softirq elevation in the context of user space tasks or interrupt threads. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10softirq: Split handling functionThomas Gleixner
Split out the inner handling function, so RT can reuse it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10softirq: Make serving softirqs a task flagThomas Gleixner
Avoid the percpu softirq_runner pointer magic by using a task flag. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>