summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2015-02-13fs: ntfs: disable interrupt only on !RTMike Galbraith
On Sat, 2007-10-27 at 11:44 +0200, Ingo Molnar wrote: > * Nick Piggin <nickpiggin@yahoo.com.au> wrote: > > > > [10138.175796] [<c0105de3>] show_trace+0x12/0x14 > > > [10138.180291] [<c0105dfb>] dump_stack+0x16/0x18 > > > [10138.184769] [<c011609f>] native_smp_call_function_mask+0x138/0x13d > > > [10138.191117] [<c0117606>] smp_call_function+0x1e/0x24 > > > [10138.196210] [<c012f85c>] on_each_cpu+0x25/0x50 > > > [10138.200807] [<c0115c74>] flush_tlb_all+0x1e/0x20 > > > [10138.205553] [<c016caaf>] kmap_high+0x1b6/0x417 > > > [10138.210118] [<c011ec88>] kmap+0x4d/0x4f > > > [10138.214102] [<c026a9d8>] ntfs_end_buffer_async_read+0x228/0x2f9 > > > [10138.220163] [<c01a0e9e>] end_bio_bh_io_sync+0x26/0x3f > > > [10138.225352] [<c01a2b09>] bio_endio+0x42/0x6d > > > [10138.229769] [<c02c2a08>] __end_that_request_first+0x115/0x4ac > > > [10138.235682] [<c02c2da7>] end_that_request_chunk+0x8/0xa > > > [10138.241052] [<c0365943>] ide_end_request+0x55/0x10a > > > [10138.246058] [<c036dae3>] ide_dma_intr+0x6f/0xac > > > [10138.250727] [<c0366d83>] ide_intr+0x93/0x1e0 > > > [10138.255125] [<c015afb4>] handle_IRQ_event+0x5c/0xc9 > > > > Looks like ntfs is kmap()ing from interrupt context. Should be using > > kmap_atomic instead, I think. > > it's not atomic interrupt context but irq thread context - and -rt > remaps kmap_atomic() to kmap() internally. Hm. Looking at the change to mm/bounce.c, perhaps I should do this instead? Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13fs-block-rt-support.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13mm: Protect activate_mm() by preempt_[disable&enable]_rt()Yong Zhang
User preempt_*_rt instead of local_irq_*_rt or otherwise there will be warning on ARM like below: WARNING: at build/linux/kernel/smp.c:459 smp_call_function_many+0x98/0x264() Modules linked in: [<c0013bb4>] (unwind_backtrace+0x0/0xe4) from [<c001be94>] (warn_slowpath_common+0x4c/0x64) [<c001be94>] (warn_slowpath_common+0x4c/0x64) from [<c001bec4>] (warn_slowpath_null+0x18/0x1c) [<c001bec4>] (warn_slowpath_null+0x18/0x1c) from [<c0053ff8>](smp_call_function_many+0x98/0x264) [<c0053ff8>] (smp_call_function_many+0x98/0x264) from [<c0054364>] (smp_call_function+0x44/0x6c) [<c0054364>] (smp_call_function+0x44/0x6c) from [<c0017d50>] (__new_context+0xbc/0x124) [<c0017d50>] (__new_context+0xbc/0x124) from [<c009e49c>] (flush_old_exec+0x460/0x5e4) [<c009e49c>] (flush_old_exec+0x460/0x5e4) from [<c00d61ac>] (load_elf_binary+0x2e0/0x11ac) [<c00d61ac>] (load_elf_binary+0x2e0/0x11ac) from [<c009d060>] (search_binary_handler+0x94/0x2a4) [<c009d060>] (search_binary_handler+0x94/0x2a4) from [<c009e8fc>] (do_execve+0x254/0x364) [<c009e8fc>] (do_execve+0x254/0x364) from [<c0010e84>] (sys_execve+0x34/0x54) [<c0010e84>] (sys_execve+0x34/0x54) from [<c000da00>] (ret_fast_syscall+0x0/0x30) ---[ end trace 0000000000000002 ]--- The reason is that ARM need irq enabled when doing activate_mm(). According to mm-protect-activate-switch-mm.patch, actually preempt_[disable|enable]_rt() is sufficient. Inspired-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1337061236-1766-1-git-send-email-yong.zhang0@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13fs: namespace preemption fixThomas Gleixner
On RT we cannot loop with preemption disabled here as mnt_make_readonly() might have been preempted. We can safely enable preemption while waiting for MNT_WRITE_HOLD to be cleared. Safe on !RT as well. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13rt: Improve the serial console PASS_LIMITIngo Molnar
Beyond the warning: drivers/tty/serial/8250/8250.c:1613:6: warning: unused variable ‘pass_counter’ [-Wunused-variable] the solution of just looping infinitely was ugly - up it to 1 million to give it a chance to continue in some really ugly situation. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13drivers-tty-pl011-irq-disable-madness.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13drivers-tty-fix-omap-lock-crap.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13sparc: serial: Clean up the locking for -rtDavid Miller
Signed-off-by: David S. Miller <davem@davemloft.net> Tested-by: Allen Pais <allen.pais@oracle.com>
2015-02-13serial: 8250: Clean up the locking for -rtIngo Molnar
In -RT the spin_lock_irqsave() does not spin but sleep if the lock is taken. Before that, local_irq_save() is invoked which disables interrupts even on -RT. Therefore local_irq_save() + spin_lock() does not work. In the ->sysrq and oops_in_progress case it is save to trylock the lock i.e. this is what we do now anyway except for ->sysrq where we assume that the lock is already taken. The spin_lock_irqsave() grabs the lock and disables the interrupts on vanilla (the same behavior) and on -RT it won't disable interrupts. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [bigeasy: add a patch description] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-02-13lglocks-rt.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13rcutree/rcu_bh_qs: disable irq while calling rcu_preempt_qs()Tiejun Chen
Any callers to the function rcu_preempt_qs() must disable irqs in order to protect the assignment to ->rcu_read_unlock_special. In RT case, rcu_bh_qs() as the wrapper of rcu_preempt_qs() is called in some scenarios where irq is enabled, like this path, do_single_softirq() | + local_irq_enable(); + handle_softirq() | | | + rcu_bh_qs() | | | + rcu_preempt_qs() | + local_irq_disable() So here we'd better disable irq directly inside of rcu_bh_qs() to fix this, otherwise the kernel may be freezable sometimes as observed. And especially this way is also kind and safe for the potential rcu_bh_qs() usage elsewhere in the future. Cc: stable-rt@vger.kernel.org Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com> Signed-off-by: Bin Jiang <bin.jiang@windriver.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-02-13rcu: Make ksoftirqd do RCU quiescent statesPaul E. McKenney
Implementing RCU-bh in terms of RCU-preempt makes the system vulnerable to network-based denial-of-service attacks. This patch therefore makes __do_softirq() invoke rcu_bh_qs(), but only when __do_softirq() is running in ksoftirqd context. A wrapper layer in interposed so that other calls to __do_softirq() avoid invoking rcu_bh_qs(). The underlying function __do_softirq_common() does the actual work. The reason that rcu_bh_qs() is bad in these non-ksoftirqd contexts is that there might be a local_bh_enable() inside an RCU-preempt read-side critical section. This local_bh_enable() can invoke __do_softirq() directly, so if __do_softirq() were to invoke rcu_bh_qs() (which just calls rcu_preempt_qs() in the PREEMPT_RT_FULL case), there would be an illegal RCU-preempt quiescent state in the middle of an RCU-preempt read-side critical section. Therefore, quiescent states can only happen in cases where __do_softirq() is invoked directly from ksoftirqd. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20111005184518.GA21601@linux.vnet.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13rcu-more-fallout.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13rcu: Merge RCU-bh into RCU-preemptThomas Gleixner
The Linux kernel has long RCU-bh read-side critical sections that intolerably increase scheduling latency under mainline's RCU-bh rules, which include RCU-bh read-side critical sections being non-preemptible. This patch therefore arranges for RCU-bh to be implemented in terms of RCU-preempt for CONFIG_PREEMPT_RT_FULL=y. This has the downside of defeating the purpose of RCU-bh, namely, handling the case where the system is subjected to a network-based denial-of-service attack that keeps at least one CPU doing full-time softirq processing. This issue will be fixed by a later commit. The current commit will need some work to make it appropriate for mainline use, for example, it needs to be extended to cover Tiny RCU. [ paulmck: Added a useful changelog ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20111005185938.GA20403@linux.vnet.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13rcu: Frob softirq testPeter Zijlstra
With RT_FULL we get the below wreckage: [ 126.060484] ======================================================= [ 126.060486] [ INFO: possible circular locking dependency detected ] [ 126.060489] 3.0.1-rt10+ #30 [ 126.060490] ------------------------------------------------------- [ 126.060492] irq/24-eth0/1235 is trying to acquire lock: [ 126.060495] (&(lock)->wait_lock#2){+.+...}, at: [<ffffffff81501c81>] rt_mutex_slowunlock+0x16/0x55 [ 126.060503] [ 126.060504] but task is already holding lock: [ 126.060506] (&p->pi_lock){-...-.}, at: [<ffffffff81074fdc>] try_to_wake_up+0x35/0x429 [ 126.060511] [ 126.060511] which lock already depends on the new lock. [ 126.060513] [ 126.060514] [ 126.060514] the existing dependency chain (in reverse order) is: [ 126.060516] [ 126.060516] -> #1 (&p->pi_lock){-...-.}: [ 126.060519] [<ffffffff810afe9e>] lock_acquire+0x145/0x18a [ 126.060524] [<ffffffff8150291e>] _raw_spin_lock_irqsave+0x4b/0x85 [ 126.060527] [<ffffffff810b5aa4>] task_blocks_on_rt_mutex+0x36/0x20f [ 126.060531] [<ffffffff815019bb>] rt_mutex_slowlock+0xd1/0x15a [ 126.060534] [<ffffffff81501ae3>] rt_mutex_lock+0x2d/0x2f [ 126.060537] [<ffffffff810d9020>] rcu_boost+0xad/0xde [ 126.060541] [<ffffffff810d90ce>] rcu_boost_kthread+0x7d/0x9b [ 126.060544] [<ffffffff8109a760>] kthread+0x99/0xa1 [ 126.060547] [<ffffffff81509b14>] kernel_thread_helper+0x4/0x10 [ 126.060551] [ 126.060552] -> #0 (&(lock)->wait_lock#2){+.+...}: [ 126.060555] [<ffffffff810af1b8>] __lock_acquire+0x1157/0x1816 [ 126.060558] [<ffffffff810afe9e>] lock_acquire+0x145/0x18a [ 126.060561] [<ffffffff8150279e>] _raw_spin_lock+0x40/0x73 [ 126.060564] [<ffffffff81501c81>] rt_mutex_slowunlock+0x16/0x55 [ 126.060566] [<ffffffff81501ce7>] rt_mutex_unlock+0x27/0x29 [ 126.060569] [<ffffffff810d9f86>] rcu_read_unlock_special+0x17e/0x1c4 [ 126.060573] [<ffffffff810da014>] __rcu_read_unlock+0x48/0x89 [ 126.060576] [<ffffffff8106847a>] select_task_rq_rt+0xc7/0xd5 [ 126.060580] [<ffffffff8107511c>] try_to_wake_up+0x175/0x429 [ 126.060583] [<ffffffff81075425>] wake_up_process+0x15/0x17 [ 126.060585] [<ffffffff81080a51>] wakeup_softirqd+0x24/0x26 [ 126.060590] [<ffffffff81081df9>] irq_exit+0x49/0x55 [ 126.060593] [<ffffffff8150a3bd>] smp_apic_timer_interrupt+0x8a/0x98 [ 126.060597] [<ffffffff81509793>] apic_timer_interrupt+0x13/0x20 [ 126.060600] [<ffffffff810d5952>] irq_forced_thread_fn+0x1b/0x44 [ 126.060603] [<ffffffff810d582c>] irq_thread+0xde/0x1af [ 126.060606] [<ffffffff8109a760>] kthread+0x99/0xa1 [ 126.060608] [<ffffffff81509b14>] kernel_thread_helper+0x4/0x10 [ 126.060611] [ 126.060612] other info that might help us debug this: [ 126.060614] [ 126.060615] Possible unsafe locking scenario: [ 126.060616] [ 126.060617] CPU0 CPU1 [ 126.060619] ---- ---- [ 126.060620] lock(&p->pi_lock); [ 126.060623] lock(&(lock)->wait_lock); [ 126.060625] lock(&p->pi_lock); [ 126.060627] lock(&(lock)->wait_lock); [ 126.060629] [ 126.060629] *** DEADLOCK *** [ 126.060630] [ 126.060632] 1 lock held by irq/24-eth0/1235: [ 126.060633] #0: (&p->pi_lock){-...-.}, at: [<ffffffff81074fdc>] try_to_wake_up+0x35/0x429 [ 126.060638] [ 126.060638] stack backtrace: [ 126.060641] Pid: 1235, comm: irq/24-eth0 Not tainted 3.0.1-rt10+ #30 [ 126.060643] Call Trace: [ 126.060644] <IRQ> [<ffffffff810acbde>] print_circular_bug+0x289/0x29a [ 126.060651] [<ffffffff810af1b8>] __lock_acquire+0x1157/0x1816 [ 126.060655] [<ffffffff810ab3aa>] ? trace_hardirqs_off_caller+0x1f/0x99 [ 126.060658] [<ffffffff81501c81>] ? rt_mutex_slowunlock+0x16/0x55 [ 126.060661] [<ffffffff810afe9e>] lock_acquire+0x145/0x18a [ 126.060664] [<ffffffff81501c81>] ? rt_mutex_slowunlock+0x16/0x55 [ 126.060668] [<ffffffff8150279e>] _raw_spin_lock+0x40/0x73 [ 126.060671] [<ffffffff81501c81>] ? rt_mutex_slowunlock+0x16/0x55 [ 126.060674] [<ffffffff810d9655>] ? rcu_report_qs_rsp+0x87/0x8c [ 126.060677] [<ffffffff81501c81>] rt_mutex_slowunlock+0x16/0x55 [ 126.060680] [<ffffffff810d9ea3>] ? rcu_read_unlock_special+0x9b/0x1c4 [ 126.060683] [<ffffffff81501ce7>] rt_mutex_unlock+0x27/0x29 [ 126.060687] [<ffffffff810d9f86>] rcu_read_unlock_special+0x17e/0x1c4 [ 126.060690] [<ffffffff810da014>] __rcu_read_unlock+0x48/0x89 [ 126.060693] [<ffffffff8106847a>] select_task_rq_rt+0xc7/0xd5 [ 126.060696] [<ffffffff810683da>] ? select_task_rq_rt+0x27/0xd5 [ 126.060701] [<ffffffff810a852a>] ? clockevents_program_event+0x8e/0x90 [ 126.060704] [<ffffffff8107511c>] try_to_wake_up+0x175/0x429 [ 126.060708] [<ffffffff810a95dc>] ? tick_program_event+0x1f/0x21 [ 126.060711] [<ffffffff81075425>] wake_up_process+0x15/0x17 [ 126.060715] [<ffffffff81080a51>] wakeup_softirqd+0x24/0x26 [ 126.060718] [<ffffffff81081df9>] irq_exit+0x49/0x55 [ 126.060721] [<ffffffff8150a3bd>] smp_apic_timer_interrupt+0x8a/0x98 [ 126.060724] [<ffffffff81509793>] apic_timer_interrupt+0x13/0x20 [ 126.060726] <EOI> [<ffffffff81072855>] ? migrate_disable+0x75/0x12d [ 126.060733] [<ffffffff81080a61>] ? local_bh_disable+0xe/0x1f [ 126.060736] [<ffffffff81080a70>] ? local_bh_disable+0x1d/0x1f [ 126.060739] [<ffffffff810d5952>] irq_forced_thread_fn+0x1b/0x44 [ 126.060742] [<ffffffff81502ac0>] ? _raw_spin_unlock_irq+0x3b/0x59 [ 126.060745] [<ffffffff810d582c>] irq_thread+0xde/0x1af [ 126.060748] [<ffffffff810d5937>] ? irq_thread_fn+0x3a/0x3a [ 126.060751] [<ffffffff810d574e>] ? irq_finalize_oneshot+0xd1/0xd1 [ 126.060754] [<ffffffff810d574e>] ? irq_finalize_oneshot+0xd1/0xd1 [ 126.060757] [<ffffffff8109a760>] kthread+0x99/0xa1 [ 126.060761] [<ffffffff81509b14>] kernel_thread_helper+0x4/0x10 [ 126.060764] [<ffffffff81069ed7>] ? finish_task_switch+0x87/0x10a [ 126.060768] [<ffffffff81502ec4>] ? retint_restore_args+0xe/0xe [ 126.060771] [<ffffffff8109a6c7>] ? __init_kthread_worker+0x8c/0x8c [ 126.060774] [<ffffffff81509b10>] ? gs_change+0xb/0xb Because irq_exit() does: void irq_exit(void) { account_system_vtime(current); trace_hardirq_exit(); sub_preempt_count(IRQ_EXIT_OFFSET); if (!in_interrupt() && local_softirq_pending()) invoke_softirq(); ... } Which triggers a wakeup, which uses RCU, now if the interrupted task has t->rcu_read_unlock_special set, the rcu usage from the wakeup will end up in rcu_read_unlock_special(). rcu_read_unlock_special() will test for in_irq(), which will fail as we just decremented preempt_count with IRQ_EXIT_OFFSET, and in_sering_softirq(), which for PREEMPT_RT_FULL reads: int in_serving_softirq(void) { int res; preempt_disable(); res = __get_cpu_var(local_softirq_runner) == current; preempt_enable(); return res; } Which will thus also fail, resulting in the above wreckage. The 'somewhat' ugly solution is to open-code the preempt_count() test in rcu_read_unlock_special(). Also, we're not at all sure how ->rcu_read_unlock_special gets set here... so this is very likely a bandaid and more thought is required. Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2015-02-13rtmutex: use a trylock for waiter lock in trylockSebastian Andrzej Siewior
Mike Galbraith captered the following: | >#11 [ffff88017b243e90] _raw_spin_lock at ffffffff815d2596 | >#12 [ffff88017b243e90] rt_mutex_trylock at ffffffff815d15be | >#13 [ffff88017b243eb0] get_next_timer_interrupt at ffffffff81063b42 | >#14 [ffff88017b243f00] tick_nohz_stop_sched_tick at ffffffff810bd1fd | >#15 [ffff88017b243f70] tick_nohz_irq_exit at ffffffff810bd7d2 | >#16 [ffff88017b243f90] irq_exit at ffffffff8105b02d | >#17 [ffff88017b243fb0] reschedule_interrupt at ffffffff815db3dd | >--- <IRQ stack> --- | >#18 [ffff88017a2a9bc8] reschedule_interrupt at ffffffff815db3dd | > [exception RIP: task_blocks_on_rt_mutex+51] | >#19 [ffff88017a2a9ce0] rt_spin_lock_slowlock at ffffffff815d183c | >#20 [ffff88017a2a9da0] lock_timer_base.isra.35 at ffffffff81061cbf | >#21 [ffff88017a2a9dd0] schedule_timeout at ffffffff815cf1ce | >#22 [ffff88017a2a9e50] rcu_gp_kthread at ffffffff810f9bbb | >#23 [ffff88017a2a9ed0] kthread at ffffffff810796d5 | >#24 [ffff88017a2a9f50] ret_from_fork at ffffffff815da04c lock_timer_base() does a try_lock() which deadlocks on the waiter lock not the lock itself. This patch takes the waiter_lock with trylock so it should work from interrupt context as well. If the fastpath doesn't work and the waiter_lock itself is taken then it seems that the lock itself taken. This patch also adds "rt_spin_unlock_after_trylock_in_irq" to keep lockdep happy. If we managed to take the wait_lock in the first place we should also be able to take it in the unlock path. Cc: stable-rt@vger.kernel.org Reported-by: Mike Galbraith <bitbucket@online.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-02-13timer/rt: Always raise the softirq if there's irq_work to be doneSteven Rostedt
It was previously discovered that some systems would hang on boot up with a previous version of 3.12-rt. This was due to RCU using irq_work, and RT defers the irq_work to a softirq. But if there's no active timers, the softirq will not be raised, and RCU work will not get done, causing the system to hang. The fix was to check that if there was no active timers but irq_work to be done, then we should raise the softirq. But this fix was not 100% correct. It left out the case that there were active timers that were not expired yet. This would have the softirq not get raised even if there was irq work to be done. If there is irq_work to be done, then we must raise the timer softirq regardless of if there is active timers or whether they are expired or not. The softirq can handle those cases. But we can never ignore irq_work. As it is only PREEMPT_RT_FULL that requires irq_work to be done in the softirq, we can pull out the check in the active_timers condition, and make the code a bit cleaner by having the irq_work check separate, and put the code in with the other #ifdef PREEMPT_RT. If there is irq_work to be done, there's no need to check the active timers or if they are expired. Just raise the time softirq and be done with it. Otherwise, we can do the timer checks just like we do with non -rt. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-02-13timer: Raise softirq if there's irq_workSteven Rostedt
[ Talking with Sebastian on IRC, it seems that doing the irq_work_run() from the interrupt in -rt is a bad thing. Here we simply raise the softirq if there's irq work to do. This too boots on my i7 ] After trying hard to figure out why my i7 box was locking up with the new active_timers code, that does not run the timer softirq if there are no active timers, I took an extra look at the softirq handler and noticed that it doesn't just run timer softirqs, it also runs irq work. This was the bug that was locking up the system. It wasn't missing a timer, it was missing irq work. By always doing the irq work callbacks, the system boots fine. The missing irq work callback was the RCU's sp_wakeup() function. No need to check for defined(CONFIG_IRQ_WORK). When that's not set the "irq_work_needs_cpu()" is a static inline that returns false. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-02-13timers: do not raise softirq unconditionallyThomas Gleixner
Mike, On Thu, 7 Nov 2013, Mike Galbraith wrote: > On Thu, 2013-11-07 at 04:26 +0100, Mike Galbraith wrote: > > On Wed, 2013-11-06 at 18:49 +0100, Thomas Gleixner wrote: > > > > I bet you are trying to work around some of the side effects of the > > > occasional tick which is still necessary despite of full nohz, right? > > > > Nope, I wanted to check out cost of nohz_full for rt, and found that it > > doesn't work at all instead, looked, and found that the sole running > > task has just awakened ksoftirqd when it wants to shut the tick down, so > > that shutdown never happens. > > Like so in virgin 3.10-rt. Box is x3550 M3 booted nowatchdog > rcu_nocbs=1-3 nohz_full=1-3, and CPUs1-3 are completely isolated via > cpusets as well. well, that very same problem is in mainline if you add "threadirqs" to the command line. But we can be smart about this. The untested patch below should address that issue. If that works on mainline we can adapt it for RT (needs a trylock(&base->lock) there). Though it's not a full solution. It needs some thought versus the softirq code of timers. Assume we have only one timer queued 1000 ticks into the future. So this change will cause the timer softirq not to be called until that timer expires and then the timer softirq is going to do 1000 loops until it catches up with jiffies. That's anything but pretty ... What worries me more is this one: pert-5229 [003] d..h1.. 684.482618: softirq_raise: vec=9 [action=RCU] The CPU has no callbacks as you shoved them over to cpu 0, so why is the RCU softirq raised? Thanks, tglx ------------------ Message-id: <alpine.DEB.2.02.1311071158350.23353@ionos.tec.linutronix.de> |CONFIG_NO_HZ_FULL + CONFIG_PREEMPT_RT_FULL = nogo Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-02-13timer-handle-idle-trylock-in-get-next-timer-irq.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13rwlocks: Fix section mismatchJohn Kacur
This fixes the following build error for the preempt-rt kernel. make kernel/fork.o CC kernel/fork.o kernel/fork.c:90: error: section of tasklist_lock conflicts with previous declaration make[2]: *** [kernel/fork.o] Error 1 make[1]: *** [kernel/fork.o] Error 2 The rt kernel cache aligns the RWLOCK in DEFINE_RWLOCK by default. The non-rt kernels explicitly cache align only the tasklist_lock in kernel/fork.c That can create a build conflict. This fixes the build problem by making the non-rt kernels cache align RWLOCKs by default. The side effect is that the other RWLOCKs are also cache aligned for non-rt. This is a short term solution for rt only. The longer term solution would be to push the cache aligned DEFINE_RWLOCK to mainline. If there are objections, then we could create a DEFINE_RWLOCK_CACHE_ALIGNED or something of that nature. Comments? Objections? Signed-off-by: John Kacur <jkacur@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/alpine.LFD.2.00.1109191104010.23118@localhost6.localdomain6 Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13bad return value in __mutex_lock_check_stampNicholas Mc Guire
Bad return value in _mutex_lock_check_stamp - this problem only would show up with 3.12.1 rt4 applied but CONFIG_PREEMPT_RT_FULL not enabled currently it would be returning what ever vprintk_emit ended up with (atleast on x86), which probably is not the intended behavior. Added a return 0; as in the case with CONFIG_PREEMPT_RT_FULL enabled. Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-02-13rtmutex: add a first shot of ww_mutexSebastian Andrzej Siewior
lockdep says: | -------------------------------------------------------------------------- | | Wound/wait tests | | --------------------- | ww api failures: ok | ok | ok | | ww contexts mixing: ok | ok | | finishing ww context: ok | ok | ok | ok | | locking mismatches: ok | ok | ok | | EDEADLK handling: ok | ok | ok | ok | ok | ok | ok | ok | ok | ok | | spinlock nest unlocked: ok | | ----------------------------------------------------- | |block | try |context| | ----------------------------------------------------- | context: ok | ok | ok | | try: ok | ok | ok | | block: ok | ok | ok | | spinlock: ok | ok | ok | Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
2015-02-13percpu-rwsem: compile fixSebastian Andrzej Siewior
The shortcut on mainline skip lockdep. No idea why this is a good thing. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-02-13rt: Cleanup of unnecessary do while 0 in read/write _lock()Nicholas Mc Guire
With the migration pushdonw a few of the do{ }while(0) loops became obsolete but got left over - this patch only removes this fallout. Patch applies on top of 3.12.9-rt13 Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-02-13read_lock migrate_disable pushdown to rt_read_lockNicholas Mc Guire
pushdown of migrate_disable/enable from read_*lock* to the rt_read_*lock* api level general mapping to mutexes: read_*lock* `-> rt_read_*lock* `-> __spin_lock (the sleeping spin locks) `-> rt_mutex The real read_lock* mapping: read_lock_irqsave -. read_lock_irq `-> rt_read_lock_irqsave() `->read_lock ---------. \ read_lock_bh ------+ \ `--> rt_read_lock() if (rt_mutex_owner(lock) != current){ `-> __rt_spin_lock() rt_spin_lock_fastlock() `->rt_mutex_cmpxchg() migrate_disable() } rwlock->read_depth++; read_trylock mapping: read_trylock `-> rt_read_trylock if (rt_mutex_owner(lock) != current){ `-> rt_mutex_trylock() rt_mutex_fasttrylock() rt_mutex_cmpxchg() migrate_disable() } rwlock->read_depth++; read_unlock* mapping: read_unlock_bh --------+ read_unlock_irq -------+ read_unlock_irqrestore + read_unlock -----------+ `-> rt_read_unlock() if(--rwlock->read_depth==0){ `-> __rt_spin_unlock() rt_spin_lock_fastunlock() `-> rt_mutex_cmpxchg() migrate_disable() } So calls to migrate_disable/enable() are better placed at the rt_read_* level of lock/trylock/unlock as all of the read_*lock* API has this as a common path. In the rt_read* API of lock/trylock/unlock the nesting level is already being recorded in rwlock->read_depth, so we can push down the migrate disable/enable to that level and condition it on the read_depth going from 0 to 1 -> migrate_disable and 1 to 0 -> migrate_enable. This eliminates the recursive calls that were needed when migrate_disable/enable was done at the read_*lock* level. The approach to read_*_bh also eliminates the concerns raised with the regards to api inbalances (read_lock_bh -> read_unlock+local_bh_enable) Tested-by: Carsten Emde <C.Emde@osadl.org> Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-02-13write_lock migrate_disable pushdown to rt_write_lockNicholas Mc Guire
pushdown of migrate_disable/enable from write_*lock* to the rt_write_*lock* api level general mapping of write_*lock* to mutexes: write_*lock* `-> rt_write_*lock* `-> __spin_lock (the sleeping __spin_lock) `-> rt_mutex write_*lock*s are non-recursive so we have two lock chains to consider - write_trylock*/write_unlock - write_lock*/wirte_unlock for both paths the migration_disable/enable must be balanced. write_trylock* mapping: write_trylock_irqsave `-> rt_write_trylock_irqsave write_trylock \ `--------> rt_write_trylock ret = rt_mutex_trylock rt_mutex_fasttrylock rt_mutex_cmpxchg if (ret) migrate_disable write_lock* mapping: write_lock_irqsave `-> rt_write_lock_irqsave write_lock_irq -> write_lock ----. \ write_lock_bh -+ \ `-> rt_write_lock __rt_spin_lock() rt_spin_lock_fastlock() rt_mutex_cmpxchg() migrate_disable() write_unlock* mapping: write_unlock_irqrestore. write_unlock_bh -------+ write_unlock_irq -> write_unlock ----------+ `-> rt_write_unlock() __rt_spin_unlock() rt_spin_lock_fastunlock() rt_mutex_cmpxchg() migrate_enable() So calls to migrate_disable/enable() are better placed at the rt_write_* level of lock/trylock/unlock as all of the write_*lock* API has this as a common path. This approach to write_*_bh also eliminates the concerns raised with regards to api inbalances (write_lock_bh -> write_unlock+local_bh_enable) Tested-by: Carsten Emde <C.Emde@osadl.org> Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-02-13migrate_disable pushd down in rt_write_trylock_irqsaveNicholas Mc Guire
Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-02-13migrate_disable pushd down in rt_spin_trylock_irqsaveNicholas Mc Guire
Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-02-13migrate_disable pushd down in atomic_dec_and_spin_lockNicholas Mc Guire
Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-02-13condition migration_disable on lock acquisitionNicholas Mc Guire
No need to unconditionally migrate_disable (what is it protecting ?) and re-enable on failure to acquire the lock. This patch moves the migrate_disable to be conditioned on sucessful lock acquisition only. Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-02-13rt: Add the preempt-rt lock replacement APIsThomas Gleixner
Map spinlocks, rwlocks, rw_semaphores and semaphores to the rt_mutex based locking functions for preempt-rt. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13rwsem-add-rt-variant.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13rt-add-rt-to-mutex-headers.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13rt-add-rt-spinlocks.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13rtmutex-avoid-include-hell.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13spinlock-types-separate-raw.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13rt-mutex-add-sleeping-spinlocks-support.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13rtmutex-lock-killable.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13futex: Ensure lock/unlock symetry versus pi_lock and hash bucket lockThomas Gleixner
In exit_pi_state_list() we have the following locking construct: spin_lock(&hb->lock); raw_spin_lock_irq(&curr->pi_lock); ... spin_unlock(&hb->lock); In !RT this works, but on RT the migrate_enable() function which is called from spin_unlock() sees atomic context due to the held pi_lock and just decrements the migrate_disable_atomic counter of the task. Now the next call to migrate_disable() sees the counter being negative and issues a warning. That check should be in migrate_enable() already. Fix this by dropping pi_lock before unlocking hb->lock and reaquire pi_lock after that again. This is safe as the loop code reevaluates head again under the pi_lock. Reported-by: Yong Zhang <yong.zhang@windriver.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-02-13futex: Fix bug on when a requeued RT task times outSteven Rostedt
Requeue with timeout causes a bug with PREEMPT_RT_FULL. The bug comes from a timed out condition. TASK 1 TASK 2 ------ ------ futex_wait_requeue_pi() futex_wait_queue_me() <timed out> double_lock_hb(); raw_spin_lock(pi_lock); if (current->pi_blocked_on) { } else { current->pi_blocked_on = PI_WAKE_INPROGRESS; run_spin_unlock(pi_lock); spin_lock(hb->lock); <-- blocked! plist_for_each_entry_safe(this) { rt_mutex_start_proxy_lock(); task_blocks_on_rt_mutex(); BUG_ON(task->pi_blocked_on)!!!! The BUG_ON() actually has a check for PI_WAKE_INPROGRESS, but the problem is that, after TASK 1 sets PI_WAKE_INPROGRESS, it then tries to grab the hb->lock, which it fails to do so. As the hb->lock is a mutex, it will block and set the "pi_blocked_on" to the hb->lock. When TASK 2 goes to requeue it, the check for PI_WAKE_INPROGESS fails because the task1's pi_blocked_on is no longer set to that, but instead, set to the hb->lock. The fix: When calling rt_mutex_start_proxy_lock() a check is made to see if the proxy tasks pi_blocked_on is set. If so, exit out early. Otherwise set it to a new flag PI_REQUEUE_INPROGRESS, which notifies the proxy task that it is being requeued, and will handle things appropriately. Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13rtmutex-futex-prepare-rt.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13md: raid5: Make raid5_percpu handling RT awareThomas Gleixner
__raid_run_ops() disables preemption with get_cpu() around the access to the raid5_percpu variables. That causes scheduling while atomic spews on RT. Serialize the access to the percpu data with a lock and keep the code preemptible. Reported-by: Udo van den Heuvel <udovdh@xs4all.nl> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Udo van den Heuvel <udovdh@xs4all.nl>
2015-02-13local-vars-migrate-disable.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13genirq: Allow disabling of softirq processing in irq thread contextThomas Gleixner
The processing of softirqs in irq thread context is a performance gain for the non-rt workloads of a system, but it's counterproductive for interrupts which are explicitely related to the realtime workload. Allow such interrupts to prevent softirq processing in their thread context. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org
2015-02-13tasklet: Prevent tasklets from going into infinite spin in RTIngo Molnar
When CONFIG_PREEMPT_RT_FULL is enabled, tasklets run as threads, and spinlocks turn are mutexes. But this can cause issues with tasks disabling tasklets. A tasklet runs under ksoftirqd, and if a tasklets are disabled with tasklet_disable(), the tasklet count is increased. When a tasklet runs, it checks this counter and if it is set, it adds itself back on the softirq queue and returns. The problem arises in RT because ksoftirq will see that a softirq is ready to run (the tasklet softirq just re-armed itself), and will not sleep, but instead run the softirqs again. The tasklet softirq will still see that the count is non-zero and will not execute the tasklet and requeue itself on the softirq again, which will cause ksoftirqd to run it again and again and again. It gets worse because ksoftirqd runs as a real-time thread. If it preempted the task that disabled tasklets, and that task has migration disabled, or can't run for other reasons, the tasklet softirq will never run because the count will never be zero, and ksoftirqd will go into an infinite loop. As an RT task, it this becomes a big problem. This is a hack solution to have tasklet_disable stop tasklets, and when a tasklet runs, instead of requeueing the tasklet softirqd it delays it. When tasklet_enable() is called, and tasklets are waiting, then the tasklet_enable() will kick the tasklets to continue. This prevents the lock up from ksoftirq going into an infinite loop. [ rostedt@goodmis.org: ported to 3.0-rt ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13softirq-make-fifo.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13softirq-disable-softirq-stacks-for-rt.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13softirq-local-lock.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13mutex-no-spin-on-rt.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>