summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2014-04-10lglocks-rt.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10rcu: Make ksoftirqd do RCU quiescent statesPaul E. McKenney
Implementing RCU-bh in terms of RCU-preempt makes the system vulnerable to network-based denial-of-service attacks. This patch therefore makes __do_softirq() invoke rcu_bh_qs(), but only when __do_softirq() is running in ksoftirqd context. A wrapper layer in interposed so that other calls to __do_softirq() avoid invoking rcu_bh_qs(). The underlying function __do_softirq_common() does the actual work. The reason that rcu_bh_qs() is bad in these non-ksoftirqd contexts is that there might be a local_bh_enable() inside an RCU-preempt read-side critical section. This local_bh_enable() can invoke __do_softirq() directly, so if __do_softirq() were to invoke rcu_bh_qs() (which just calls rcu_preempt_qs() in the PREEMPT_RT_FULL case), there would be an illegal RCU-preempt quiescent state in the middle of an RCU-preempt read-side critical section. Therefore, quiescent states can only happen in cases where __do_softirq() is invoked directly from ksoftirqd. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20111005184518.GA21601@linux.vnet.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10rcu: Merge RCU-bh into RCU-preemptThomas Gleixner
The Linux kernel has long RCU-bh read-side critical sections that intolerably increase scheduling latency under mainline's RCU-bh rules, which include RCU-bh read-side critical sections being non-preemptible. This patch therefore arranges for RCU-bh to be implemented in terms of RCU-preempt for CONFIG_PREEMPT_RT_FULL=y. This has the downside of defeating the purpose of RCU-bh, namely, handling the case where the system is subjected to a network-based denial-of-service attack that keeps at least one CPU doing full-time softirq processing. This issue will be fixed by a later commit. The current commit will need some work to make it appropriate for mainline use, for example, it needs to be extended to cover Tiny RCU. [ paulmck: Added a useful changelog ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20111005185938.GA20403@linux.vnet.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10rtmutex: use a trylock for waiter lock in trylockSebastian Andrzej Siewior
Mike Galbraith captered the following: | >#11 [ffff88017b243e90] _raw_spin_lock at ffffffff815d2596 | >#12 [ffff88017b243e90] rt_mutex_trylock at ffffffff815d15be | >#13 [ffff88017b243eb0] get_next_timer_interrupt at ffffffff81063b42 | >#14 [ffff88017b243f00] tick_nohz_stop_sched_tick at ffffffff810bd1fd | >#15 [ffff88017b243f70] tick_nohz_irq_exit at ffffffff810bd7d2 | >#16 [ffff88017b243f90] irq_exit at ffffffff8105b02d | >#17 [ffff88017b243fb0] reschedule_interrupt at ffffffff815db3dd | >--- <IRQ stack> --- | >#18 [ffff88017a2a9bc8] reschedule_interrupt at ffffffff815db3dd | > [exception RIP: task_blocks_on_rt_mutex+51] | >#19 [ffff88017a2a9ce0] rt_spin_lock_slowlock at ffffffff815d183c | >#20 [ffff88017a2a9da0] lock_timer_base.isra.35 at ffffffff81061cbf | >#21 [ffff88017a2a9dd0] schedule_timeout at ffffffff815cf1ce | >#22 [ffff88017a2a9e50] rcu_gp_kthread at ffffffff810f9bbb | >#23 [ffff88017a2a9ed0] kthread at ffffffff810796d5 | >#24 [ffff88017a2a9f50] ret_from_fork at ffffffff815da04c lock_timer_base() does a try_lock() which deadlocks on the waiter lock not the lock itself. This patch takes the waiter_lock with trylock so it should work from interrupt context as well. If the fastpath doesn't work and the waiter_lock itself is taken then it seems that the lock itself taken. This patch also adds "rt_spin_unlock_after_trylock_in_irq" to keep lockdep happy. If we managed to take the wait_lock in the first place we should also be able to take it in the unlock path. Cc: stable-rt@vger.kernel.org Reported-by: Mike Galbraith <bitbucket@online.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2014-04-10timers: do not raise softirq unconditionallyThomas Gleixner
Mike, On Thu, 7 Nov 2013, Mike Galbraith wrote: > On Thu, 2013-11-07 at 04:26 +0100, Mike Galbraith wrote: > > On Wed, 2013-11-06 at 18:49 +0100, Thomas Gleixner wrote: > > > > I bet you are trying to work around some of the side effects of the > > > occasional tick which is still necessary despite of full nohz, right? > > > > Nope, I wanted to check out cost of nohz_full for rt, and found that it > > doesn't work at all instead, looked, and found that the sole running > > task has just awakened ksoftirqd when it wants to shut the tick down, so > > that shutdown never happens. > > Like so in virgin 3.10-rt. Box is x3550 M3 booted nowatchdog > rcu_nocbs=1-3 nohz_full=1-3, and CPUs1-3 are completely isolated via > cpusets as well. well, that very same problem is in mainline if you add "threadirqs" to the command line. But we can be smart about this. The untested patch below should address that issue. If that works on mainline we can adapt it for RT (needs a trylock(&base->lock) there). Though it's not a full solution. It needs some thought versus the softirq code of timers. Assume we have only one timer queued 1000 ticks into the future. So this change will cause the timer softirq not to be called until that timer expires and then the timer softirq is going to do 1000 loops until it catches up with jiffies. That's anything but pretty ... What worries me more is this one: pert-5229 [003] d..h1.. 684.482618: softirq_raise: vec=9 [action=RCU] The CPU has no callbacks as you shoved them over to cpu 0, so why is the RCU softirq raised? Thanks, tglx ------------------ Message-id: <alpine.DEB.2.02.1311071158350.23353@ionos.tec.linutronix.de> |CONFIG_NO_HZ_FULL + CONFIG_PREEMPT_RT_FULL = nogo Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2014-04-10timer-handle-idle-trylock-in-get-next-timer-irq.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10rwlocks: Fix section mismatchJohn Kacur
This fixes the following build error for the preempt-rt kernel. make kernel/fork.o CC kernel/fork.o kernel/fork.c:90: error: section of tasklist_lock conflicts with previous declaration make[2]: *** [kernel/fork.o] Error 1 make[1]: *** [kernel/fork.o] Error 2 The rt kernel cache aligns the RWLOCK in DEFINE_RWLOCK by default. The non-rt kernels explicitly cache align only the tasklist_lock in kernel/fork.c That can create a build conflict. This fixes the build problem by making the non-rt kernels cache align RWLOCKs by default. The side effect is that the other RWLOCKs are also cache aligned for non-rt. This is a short term solution for rt only. The longer term solution would be to push the cache aligned DEFINE_RWLOCK to mainline. If there are objections, then we could create a DEFINE_RWLOCK_CACHE_ALIGNED or something of that nature. Comments? Objections? Signed-off-by: John Kacur <jkacur@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/alpine.LFD.2.00.1109191104010.23118@localhost6.localdomain6 Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10rt: Cleanup of unnecessary do while 0 in read/write _lock()Nicholas Mc Guire
With the migration pushdonw a few of the do{ }while(0) loops became obsolete but got left over - this patch only removes this fallout. Patch applies on top of 3.12.9-rt13 Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2014-04-10read_lock migrate_disable pushdown to rt_read_lockNicholas Mc Guire
pushdown of migrate_disable/enable from read_*lock* to the rt_read_*lock* api level general mapping to mutexes: read_*lock* `-> rt_read_*lock* `-> __spin_lock (the sleeping spin locks) `-> rt_mutex The real read_lock* mapping: read_lock_irqsave -. read_lock_irq `-> rt_read_lock_irqsave() `->read_lock ---------. \ read_lock_bh ------+ \ `--> rt_read_lock() if (rt_mutex_owner(lock) != current){ `-> __rt_spin_lock() rt_spin_lock_fastlock() `->rt_mutex_cmpxchg() migrate_disable() } rwlock->read_depth++; read_trylock mapping: read_trylock `-> rt_read_trylock if (rt_mutex_owner(lock) != current){ `-> rt_mutex_trylock() rt_mutex_fasttrylock() rt_mutex_cmpxchg() migrate_disable() } rwlock->read_depth++; read_unlock* mapping: read_unlock_bh --------+ read_unlock_irq -------+ read_unlock_irqrestore + read_unlock -----------+ `-> rt_read_unlock() if(--rwlock->read_depth==0){ `-> __rt_spin_unlock() rt_spin_lock_fastunlock() `-> rt_mutex_cmpxchg() migrate_disable() } So calls to migrate_disable/enable() are better placed at the rt_read_* level of lock/trylock/unlock as all of the read_*lock* API has this as a common path. In the rt_read* API of lock/trylock/unlock the nesting level is already being recorded in rwlock->read_depth, so we can push down the migrate disable/enable to that level and condition it on the read_depth going from 0 to 1 -> migrate_disable and 1 to 0 -> migrate_enable. This eliminates the recursive calls that were needed when migrate_disable/enable was done at the read_*lock* level. The approach to read_*_bh also eliminates the concerns raised with the regards to api inbalances (read_lock_bh -> read_unlock+local_bh_enable) Tested-by: Carsten Emde <C.Emde@osadl.org> Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2014-04-10write_lock migrate_disable pushdown to rt_write_lockNicholas Mc Guire
pushdown of migrate_disable/enable from write_*lock* to the rt_write_*lock* api level general mapping of write_*lock* to mutexes: write_*lock* `-> rt_write_*lock* `-> __spin_lock (the sleeping __spin_lock) `-> rt_mutex write_*lock*s are non-recursive so we have two lock chains to consider - write_trylock*/write_unlock - write_lock*/wirte_unlock for both paths the migration_disable/enable must be balanced. write_trylock* mapping: write_trylock_irqsave `-> rt_write_trylock_irqsave write_trylock \ `--------> rt_write_trylock ret = rt_mutex_trylock rt_mutex_fasttrylock rt_mutex_cmpxchg if (ret) migrate_disable write_lock* mapping: write_lock_irqsave `-> rt_write_lock_irqsave write_lock_irq -> write_lock ----. \ write_lock_bh -+ \ `-> rt_write_lock __rt_spin_lock() rt_spin_lock_fastlock() rt_mutex_cmpxchg() migrate_disable() write_unlock* mapping: write_unlock_irqrestore. write_unlock_bh -------+ write_unlock_irq -> write_unlock ----------+ `-> rt_write_unlock() __rt_spin_unlock() rt_spin_lock_fastunlock() rt_mutex_cmpxchg() migrate_enable() So calls to migrate_disable/enable() are better placed at the rt_write_* level of lock/trylock/unlock as all of the write_*lock* API has this as a common path. This approach to write_*_bh also eliminates the concerns raised with regards to api inbalances (write_lock_bh -> write_unlock+local_bh_enable) Tested-by: Carsten Emde <C.Emde@osadl.org> Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2014-04-10rt: Add the preempt-rt lock replacement APIsThomas Gleixner
Map spinlocks, rwlocks, rw_semaphores and semaphores to the rt_mutex based locking functions for preempt-rt. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10rwsem-add-rt-variant.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10rt-add-rt-to-mutex-headers.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10rt-add-rt-spinlocks.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10rtmutex-avoid-include-hell.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10spinlock-types-separate-raw.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10rt-mutex-add-sleeping-spinlocks-support.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10rtmutex-lock-killable.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10local-vars-migrate-disable.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10genirq: Allow disabling of softirq processing in irq thread contextThomas Gleixner
The processing of softirqs in irq thread context is a performance gain for the non-rt workloads of a system, but it's counterproductive for interrupts which are explicitely related to the realtime workload. Allow such interrupts to prevent softirq processing in their thread context. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org
2014-04-10tasklet: Prevent tasklets from going into infinite spin in RTIngo Molnar
When CONFIG_PREEMPT_RT_FULL is enabled, tasklets run as threads, and spinlocks turn are mutexes. But this can cause issues with tasks disabling tasklets. A tasklet runs under ksoftirqd, and if a tasklets are disabled with tasklet_disable(), the tasklet count is increased. When a tasklet runs, it checks this counter and if it is set, it adds itself back on the softirq queue and returns. The problem arises in RT because ksoftirq will see that a softirq is ready to run (the tasklet softirq just re-armed itself), and will not sleep, but instead run the softirqs again. The tasklet softirq will still see that the count is non-zero and will not execute the tasklet and requeue itself on the softirq again, which will cause ksoftirqd to run it again and again and again. It gets worse because ksoftirqd runs as a real-time thread. If it preempted the task that disabled tasklets, and that task has migration disabled, or can't run for other reasons, the tasklet softirq will never run because the count will never be zero, and ksoftirqd will go into an infinite loop. As an RT task, it this becomes a big problem. This is a hack solution to have tasklet_disable stop tasklets, and when a tasklet runs, instead of requeueing the tasklet softirqd it delays it. When tasklet_enable() is called, and tasklets are waiting, then the tasklet_enable() will kick the tasklets to continue. This prevents the lock up from ksoftirq going into an infinite loop. [ rostedt@goodmis.org: ported to 3.0-rt ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10softirq-disable-softirq-stacks-for-rt.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10softirq-local-lock.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10lockdep-rt.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10softirq: Sanitize softirq pending for NOHZ/RTThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10sched: teach migrate_disable about atomic contextsPeter Zijlstra
<NMI> [<ffffffff812dafd8>] spin_bug+0x94/0xa8 [<ffffffff812db07f>] do_raw_spin_lock+0x43/0xea [<ffffffff814fa9be>] _raw_spin_lock_irqsave+0x6b/0x85 [<ffffffff8106ff9e>] ? migrate_disable+0x75/0x12d [<ffffffff81078aaf>] ? pin_current_cpu+0x36/0xb0 [<ffffffff8106ff9e>] migrate_disable+0x75/0x12d [<ffffffff81115b9d>] pagefault_disable+0xe/0x1f [<ffffffff81047027>] copy_from_user_nmi+0x74/0xe6 [<ffffffff810489d7>] perf_callchain_user+0xf3/0x135 Now clearly we can't go around taking locks from NMI context, cure this by short-circuiting migrate_disable() when we're in an atomic context already. Add some extra debugging to avoid things like: preempt_disable() migrate_disable(); preempt_enable(); migrate_enable(); Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1314967297.1301.14.camel@twins Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/n/tip-wbot4vsmwhi8vmbf83hsclk6@git.kernel.org
2014-04-10sched: Generic migrate_disablePeter Zijlstra
Make migrate_disable() be a preempt_disable() for !rt kernels. This allows generic code to use it but still enforces that these code sections stay relatively small. A preemptible migrate_disable() accessible for general use would allow people growing arbitrary per-cpu crap instead of clean these things up. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-275i87sl8e1jcamtchmehonm@git.kernel.org
2014-04-10migrate-disable-rt-variant.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10ftrace-migrate-disable-tracing.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10sched-migrate-disable.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10hotplug: Lightweight get online cpusThomas Gleixner
get_online_cpus() is a heavy weight function which involves a global mutex. migrate_disable() wants a simpler construct which prevents only a CPU from going doing while a task is in a migrate disabled section. Implement a per cpu lockless mechanism, which serializes only in the real unplug case on a global mutex. That serialization affects only tasks on the cpu which should be brought down. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10cond-resched-lock-rt-tweak.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10cond-resched-softirq-fix.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10sched-might-sleep-do-not-account-rcu-depth.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10sched-rt-mutex-wakeup.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10sched-mmdrop-delayed.patchThomas Gleixner
Needs thread context (pgd_lock) -> ifdeffed. workqueues wont work with RT Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10sched-delay-put-task.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10posix-timers: thread posix-cpu-timers on -rtJohn Stultz
posix-cpu-timer code takes non -rt safe locks in hard irq context. Move it to a thread. [ 3.0 fixes from Peter Zijlstra <peterz@infradead.org> ] Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10hrtimer: fixup hrtimer callback changes for preempt-rtThomas Gleixner
In preempt-rt we can not call the callbacks which take sleeping locks from the timer interrupt context. Bring back the softirq split for now, until we fixed the signal delivery problem for real. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2014-04-10hrtimers: prepare full preemptionIngo Molnar
Make cancellation of a running callback in softirq context safe against preemption. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10timers: prepare for full preemptionIngo Molnar
When softirqs can be preempted we need to make sure that cancelling the timer from the active thread can not deadlock vs. a running timer callback. Add a waitqueue to resolve that. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10radix-tree-rt-aware.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10mm: shrink the page frame to !-rt sizePeter Zijlstra
He below is a boot-tested hack to shrink the page frame size back to normal. Should be a net win since there should be many less PTE-pages than page-frames. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10mm: make vmstat -rt awareIngo Molnar
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10cpu-rt-variants.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10use local spin_locks in local_lockNicholas Mc Guire
Drop recursive call to migrate_disabel/enable for local_*lock* api reported by Steven Rostedt. local_lock will call migrate_disable via get_local_var - call tree is get_locked_var `-> local_lock(lvar) `-> __local_lock(&get_local_var(lvar)); `--> # define get_local_var(var) (*({ migrate_disable(); &__get_cpu_var(var); })) \ thus there should be no need to call migrate_disable/enable recursively in spin_try/lock/unlock. This patch addes a spin_trylock_local and replaces the migration disabling calls by the local calls. This patch is incomplete as it does not yet cover the _irq/_irqsave variants by local locks. This patch requires the API cleanup in kernel/softirq.c or it would break softirq_lock/unlock with respect to migration. Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2014-04-10rt-local-irq-lock.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10local-var.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10genirq: do not invoke the affinity callback via a workqueueSebastian Andrzej Siewior
Joe Korty reported, that __irq_set_affinity_locked() schedules a workqueue while holding a rawlock which results in a might_sleep() warning. This patch moves the invokation into a process context so that we only wakeup() a process while holding the lock. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2014-04-10genirq-force-threading.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>