summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2015-02-13cpumask: Disable CONFIG_CPUMASK_OFFSTACK for RTThomas Gleixner
We can't deal with the cpumask allocations which happen in atomic context (see arch/x86/kernel/apic/io_apic.c) on RT right now. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13crypto: Reduce preempt disabled regions, more algosSebastian Andrzej Siewior
Don Estabrook reported | kernel: WARNING: CPU: 2 PID: 858 at kernel/sched/core.c:2428 migrate_disable+0xed/0x100() | kernel: WARNING: CPU: 2 PID: 858 at kernel/sched/core.c:2462 migrate_enable+0x17b/0x200() | kernel: WARNING: CPU: 3 PID: 865 at kernel/sched/core.c:2428 migrate_disable+0xed/0x100() and his backtrace showed some crypto functions which looked fine. The problem is the following sequence: glue_xts_crypt_128bit() { blkcipher_walk_virt(); /* normal migrate_disable() */ glue_fpu_begin(); /* get atomic */ while (nbytes) { __glue_xts_crypt_128bit(); blkcipher_walk_done(); /* with nbytes = 0, migrate_enable() * while we are atomic */ }; glue_fpu_end() /* no longer atomic */ } and this is why the counter get out of sync and the warning is printed. The other problem is that we are non-preemptible between glue_fpu_begin() and glue_fpu_end() and the latency grows. To fix this, I shorten the FPU off region and ensure blkcipher_walk_done() is called with preemption enabled. This might hurt the performance because we now enable/disable the FPU state more often but we gain lower latency and the bug is gone. Cc: stable-rt@vger.kernel.org Reported-by: Don Estabrook <don.estabrook@gmail.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-02-13x86: crypto: Reduce preempt disabled regionsPeter Zijlstra
Restrict the preempt disabled regions to the actual floating point operations and enable preemption for the administrative actions. This is necessary on RT to avoid that kfree and other operations are called with preemption disabled. Reported-and-tested-by: Carsten Emde <cbe@osadl.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: stable-rt@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13x86-kvm-require-const-tsc-for-rt.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13arm-enable-highmem-for-rt.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13arm/highmem: flush tlb on unmapSebastian Andrzej Siewior
The tlb should be flushed on unmap and thus make the mapping entry invalid. This is only done in the non-debug case which does not look right. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-02-13x86/highmem: add a "already used pte" checkSebastian Andrzej Siewior
This is a copy from kmap_atomic_prot(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-02-13mm, rt: kmap_atomic schedulingPeter Zijlstra
In fact, with migrate_disable() existing one could play games with kmap_atomic. You could save/restore the kmap_atomic slots on context switch (if there are any in use of course), this should be esp easy now that we have a kmap_atomic stack. Something like the below.. it wants replacing all the preempt_disable() stuff with pagefault_disable() && migrate_disable() of course, but then you can flip kmaps around like below. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> [dvhart@linux.intel.com: build fix] Link: http://lkml.kernel.org/r/1311842631.5890.208.camel@twins [tglx@linutronix.de: Get rid of the per cpu variable and store the idx and the pte content right away in the task struct. Shortens the context switch code. ]
2015-02-13mips-disable-highmem-on-rt.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13arm/unwind: use a raw_spin_lockSebastian Andrzej Siewior
Mostly unwind is done with irqs enabled however SLUB may call it with irqs disabled while creating a new SLUB cache. I had system freeze while loading a module which called kmem_cache_create() on init. That means SLUB's __slab_alloc() disabled interrupts and then ->new_slab_objects() ->new_slab() ->setup_object() ->setup_object_debug() ->init_tracking() ->set_track() ->save_stack_trace() ->save_stack_trace_tsk() ->walk_stackframe() ->unwind_frame() ->unwind_find_idx() =>spin_lock_irqsave(&unwind_lock); Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-02-13arm-disable-highmem-on-rt.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13power-disable-highmem-on-rt.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13Powerpc: Use generic rwsem on RTThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13irq_work: allow certain work in hard irq contextSebastian Andrzej Siewior
irq_work is processed in softirq context on -RT because we want to avoid long latencies which might arise from processing lots of perf events. The noHZ-full mode requires its callback to be called from real hardirq context (commit 76c24fb ("nohz: New APIs to re-evaluate the tick on full dynticks CPUs")). If it is called from a thread context we might get wrong results for checks like "is_idle_task(current)". This patch introduces a second list (hirq_work_list) which will be used if irq_work_run() has been invoked from hardirq context and process only work items marked with IRQ_WORK_HARD_IRQ. This patch also removes arch_irq_work_raise() from sparc & powerpc like it is already done for x86. Atleast for powerpc it is somehow superfluous because it is called from the timer interrupt which should invoke update_process_times(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-02-13x86-no-perf-irq-work-rt.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13x86: Use generic rwsem_spinlocks on -rtThomas Gleixner
Simplifies the separation of anon_rw_semaphores and rw_semaphores for -rt. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13x86: stackprotector: Avoid random pool on rtThomas Gleixner
CPU bringup calls into the random pool to initialize the stack canary. During boot that works nicely even on RT as the might sleep checks are disabled. During CPU hotplug the might sleep checks trigger. Making the locks in random raw is a major PITA, so avoid the call on RT is the only sensible solution. This is basically the same randomness which we get during boot where the random pool has no entropy and we rely on the TSC randomnness. Reported-by: Carsten Emde <carsten.emde@osadl.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13x86/mce: Defer mce wakeups to threads for PREEMPT_RTSteven Rostedt
We had a customer report a lockup on a 3.0-rt kernel that had the following backtrace: [ffff88107fca3e80] rt_spin_lock_slowlock at ffffffff81499113 [ffff88107fca3f40] rt_spin_lock at ffffffff81499a56 [ffff88107fca3f50] __wake_up at ffffffff81043379 [ffff88107fca3f80] mce_notify_irq at ffffffff81017328 [ffff88107fca3f90] intel_threshold_interrupt at ffffffff81019508 [ffff88107fca3fa0] smp_threshold_interrupt at ffffffff81019fc1 [ffff88107fca3fb0] threshold_interrupt at ffffffff814a1853 It actually bugged because the lock was taken by the same owner that already had that lock. What happened was the thread that was setting itself on a wait queue had the lock when an MCE triggered. The MCE interrupt does a wake up on its wait list and grabs the same lock. NOTE: THIS IS NOT A BUG ON MAINLINE Sorry for yelling, but as I Cc'd mainline maintainers I want them to know that this is an PREEMPT_RT bug only. I only Cc'd them for advice. On PREEMPT_RT the wait queue locks are converted from normal "spin_locks" into an rt_mutex (see the rt_spin_lock_slowlock above). These are not to be taken by hard interrupt context. This usually isn't a problem as most all interrupts in PREEMPT_RT are converted into schedulable threads. Unfortunately that's not the case with the MCE irq. As wait queue locks are notorious for long hold times, we can not convert them to raw_spin_locks without causing issues with -rt. But Thomas has created a "simple-wait" structure that uses raw spin locks which may have been a good fit. Unfortunately, wait queues are not the only issue, as the mce_notify_irq also does a schedule_work(), which grabs the workqueue spin locks that have the exact same issue. Thus, this patch I'm proposing is to move the actual work of the MCE interrupt into a helper thread that gets woken up on the MCE interrupt and does the work in a schedulable context. NOTE: THIS PATCH ONLY CHANGES THE BEHAVIOR WHEN PREEMPT_RT IS SET Oops, sorry for yelling again, but I want to stress that I keep the same behavior of mainline when PREEMPT_RT is not set. Thus, this only changes the MCE behavior when PREEMPT_RT is configured. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> [bigeasy@linutronix: make mce_notify_work() a proper prototype, use kthread_run()] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-02-13x86/mce: fix mce timer intervalMike Galbraith
Seems mce timer fire at the wrong frequency in -rt kernels since roughly forever due to 32 bit overflow. 3.8-rt is also missing a multiplier. Add missing us -> ns conversion and 32 bit overflow prevention. Cc: stable-rt@vger.kernel.org Signed-off-by: Mike Galbraith <bitbucket@online.de> [bigeasy: use ULL instead of u64 cast] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-02-13x86: Convert mce timer to hrtimerThomas Gleixner
mce_timer is started in atomic contexts of cpu bringup. This results in might_sleep() warnings on RT. Convert mce_timer to a hrtimer to avoid this. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13softirq-disable-softirq-stacks-for-rt.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13ARM: Initialize ptl->lock for vector pageFrank Rowand
Without this patch, ARM can not use SPLIT_PTLOCK_CPUS if PREEMPT_RT_FULL=y because vectors_user_mapping() creates a VM_ALWAYSDUMP mapping of the vector page (address 0xffff0000), but no ptl->lock has been allocated for the page. An attempt to coredump that page will result in a kernel NULL pointer dereference when follow_page() attempts to lock the page. The call tree to the NULL pointer dereference is: do_notify_resume() get_signal_to_deliver() do_coredump() elf_core_dump() get_dump_page() __get_user_pages() follow_page() pte_offset_map_lock() <----- a #define ... rt_spin_lock() The underlying problem is exposed by mm-shrink-the-page-frame-to-rt-size.patch. Signed-off-by: Frank Rowand <frank.rowand@am.sony.com> Cc: Frank <Frank_Rowand@sonyusa.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/4E87C535.2030907@am.sony.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13kconfig-disable-a-few-options-rt.patchThomas Gleixner
Disable stuff which is known to have issues on RT Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13x86: Do not disable preemption in int3 on 32bitSteven Rostedt
Preemption must be disabled before enabling interrupts in do_trap on x86_64 because the stack in use for int3 and debug is a per CPU stack set by th IST. But 32bit does not have an IST and the stack still belongs to the current task and there is no problem in scheduling out the task. Keep preemption enabled on X86_32 when enabling interrupts for do_trap(). The name of the function is changed from preempt_conditional_sti/cli() to conditional_sti/cli_ist(), to annotate that this function is used when the stack is on the IST. Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13x86: Do not unmask io_apic when interrupt is in progressIngo Molnar
With threaded interrupts we might see an interrupt in progress on migration. Do not unmask it when this is the case. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13mm: pagefault_disabled()Peter Zijlstra
Wrap the test for pagefault_disabled() into a helper, this allows us to remove the need for current->pagefault_disabled on !-rt kernels. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-3yy517m8zsi9fpsf14xfaqkw@git.kernel.org
2015-02-13mm: Fixup all fault handlers to check current->pagefault_disableThomas Gleixner
Necessary for decoupling pagefault disable from preempt count. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13ARM: AT91: PIT: Remove irq handler when clock event is unusedBenedikt Spranger
Setup and remove the interrupt handler in clock event mode selection. This avoids calling the (shared) interrupt handler when the device is not used. Signed-off-by: Benedikt Spranger <b.spranger@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [bigeasy: redo the patch with NR_IRQS_LEGACY which is probably required since commit 8fe82a55 ("ARM: at91: sparse irq support") which is included since v3.6. Patch based on what Sami Pietikäinen <Sami.Pietikainen@wapice.com> suggested]. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-02-13signal/x86: Delay calling signals in atomicOleg Nesterov
On x86_64 we must disable preemption before we enable interrupts for stack faults, int3 and debugging, because the current task is using a per CPU debug stack defined by the IST. If we schedule out, another task can come in and use the same stack and cause the stack to be corrupted and crash the kernel on return. When CONFIG_PREEMPT_RT_FULL is enabled, spin_locks become mutexes, and one of these is the spin lock used in signal handling. Some of the debug code (int3) causes do_trap() to send a signal. This function calls a spin lock that has been converted to a mutex and has the possibility to sleep. If this happens, the above issues with the corrupted stack is possible. Instead of calling the signal right away, for PREEMPT_RT and x86_64, the signal information is stored on the stacks task_struct and TIF_NOTIFY_RESUME is set. Then on exit of the trap, the signal resume code will send the signal when preemption is enabled. [ rostedt: Switched from #ifdef CONFIG_PREEMPT_RT_FULL to ARCH_RT_DELAYS_SIGNAL_SEND and added comments to the code. ] Cc: stable-rt@vger.kernel.org Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13preempt-rt: Convert arm boot_lock to rawFrank Rowand
The arm boot_lock is used by the secondary processor startup code. The locking task is the idle thread, which has idle->sched_class == &idle_sched_class. idle_sched_class->enqueue_task == NULL, so if the idle task blocks on the lock, the attempt to wake it when the lock becomes available will fail: try_to_wake_up() ... activate_task() enqueue_task() p->sched_class->enqueue_task(rq, p, flags) Fix by converting boot_lock to a raw spin lock. Signed-off-by: Frank Rowand <frank.rowand@am.sony.com> Link: http://lkml.kernel.org/r/4E77B952.3010606@am.sony.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13mips-enable-interrupts-in-signal.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13Kind of revert "powerpc: 52xx: provide a default in mpc52xx_irqhost_map()"Wolfram Sang
This more or less reverts commit 6391f697d4892a6f233501beea553e13f7745a23. Instead of adding an unneeded 'default', mark the variable to prevent the false positive 'uninitialized var'. The other change (fixing the printout) needs revert, too. We want to know WHICH critical irq failed, not which level it had. Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Anatolij Gustschin <agust@denx.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-02-13sparc64: convert ctx_alloc_lock raw_spinlock_tAllen Pais
Signed-off-by: Allen Pais <allen.pais@oracle.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-02-13sparc64: convert spinlock_t to raw_spinlock_t in mmu_context_tAllen Pais
Issue debugged by Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Allen Pais <allen.pais@oracle.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-02-13sparc64: use generic rwsem spinlocks rtAllen Pais
Signed-off-by: Allen Pais <allen.pais@oracle.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-02-13sparc: provide EARLY_PRINTK for SPARCKirill Tkhai
sparc does not have CONFIG_EARLY_PRINTK option. So early-printk-consolidate.patch breaks compilation: arch/sparc/built-in.o: In function `setup_arch': (.init.text+0x15e4): undefined reference to `early_console' arch/sparc/built-in.o: In function `setup_arch': (.init.text+0x15ec): undefined reference to `early_console' The below addition fixes that. Signed-off-by: Kirill Tkhai <tkhai@yandex.ru> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-02-13early-printk-consolidate.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13Reset to 3.12.37Scott Wood
2014-05-14powerpc-preempt-lazy-support.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-14arm-preempt-lazy-support.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-14x86-preempt-lazy.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-14cpumask: Disable CONFIG_CPUMASK_OFFSTACK for RTThomas Gleixner
We can't deal with the cpumask allocations which happen in atomic context (see arch/x86/kernel/apic/io_apic.c) on RT right now. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-14crypto: Reduce preempt disabled regions, more algosSebastian Andrzej Siewior
Don Estabrook reported | kernel: WARNING: CPU: 2 PID: 858 at kernel/sched/core.c:2428 migrate_disable+0xed/0x100() | kernel: WARNING: CPU: 2 PID: 858 at kernel/sched/core.c:2462 migrate_enable+0x17b/0x200() | kernel: WARNING: CPU: 3 PID: 865 at kernel/sched/core.c:2428 migrate_disable+0xed/0x100() and his backtrace showed some crypto functions which looked fine. The problem is the following sequence: glue_xts_crypt_128bit() { blkcipher_walk_virt(); /* normal migrate_disable() */ glue_fpu_begin(); /* get atomic */ while (nbytes) { __glue_xts_crypt_128bit(); blkcipher_walk_done(); /* with nbytes = 0, migrate_enable() * while we are atomic */ }; glue_fpu_end() /* no longer atomic */ } and this is why the counter get out of sync and the warning is printed. The other problem is that we are non-preemptible between glue_fpu_begin() and glue_fpu_end() and the latency grows. To fix this, I shorten the FPU off region and ensure blkcipher_walk_done() is called with preemption enabled. This might hurt the performance because we now enable/disable the FPU state more often but we gain lower latency and the bug is gone. Cc: stable-rt@vger.kernel.org Reported-by: Don Estabrook <don.estabrook@gmail.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2014-05-14x86: crypto: Reduce preempt disabled regionsPeter Zijlstra
Restrict the preempt disabled regions to the actual floating point operations and enable preemption for the administrative actions. This is necessary on RT to avoid that kfree and other operations are called with preemption disabled. Reported-and-tested-by: Carsten Emde <cbe@osadl.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: stable-rt@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-14x86-kvm-require-const-tsc-for-rt.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-14arm-enable-highmem-for-rt.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-14arm/highmem: flush tlb on unmapSebastian Andrzej Siewior
The tlb should be flushed on unmap and thus make the mapping entry invalid. This is only done in the non-debug case which does not look right. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2014-05-14x86/highmem: add a "already used pte" checkSebastian Andrzej Siewior
This is a copy from kmap_atomic_prot(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2014-05-14mm, rt: kmap_atomic schedulingPeter Zijlstra
In fact, with migrate_disable() existing one could play games with kmap_atomic. You could save/restore the kmap_atomic slots on context switch (if there are any in use of course), this should be esp easy now that we have a kmap_atomic stack. Something like the below.. it wants replacing all the preempt_disable() stuff with pagefault_disable() && migrate_disable() of course, but then you can flip kmaps around like below. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> [dvhart@linux.intel.com: build fix] Link: http://lkml.kernel.org/r/1311842631.5890.208.camel@twins [tglx@linutronix.de: Get rid of the per cpu variable and store the idx and the pte content right away in the task struct. Shortens the context switch code. ]
2014-05-14mips-disable-highmem-on-rt.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>