summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2014-04-10ARM: Initialize ptl->lock for vector pageFrank Rowand
Without this patch, ARM can not use SPLIT_PTLOCK_CPUS if PREEMPT_RT_FULL=y because vectors_user_mapping() creates a VM_ALWAYSDUMP mapping of the vector page (address 0xffff0000), but no ptl->lock has been allocated for the page. An attempt to coredump that page will result in a kernel NULL pointer dereference when follow_page() attempts to lock the page. The call tree to the NULL pointer dereference is: do_notify_resume() get_signal_to_deliver() do_coredump() elf_core_dump() get_dump_page() __get_user_pages() follow_page() pte_offset_map_lock() <----- a #define ... rt_spin_lock() The underlying problem is exposed by mm-shrink-the-page-frame-to-rt-size.patch. Signed-off-by: Frank Rowand <frank.rowand@am.sony.com> Cc: Frank <Frank_Rowand@sonyusa.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/4E87C535.2030907@am.sony.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10kconfig-disable-a-few-options-rt.patchThomas Gleixner
Disable stuff which is known to have issues on RT Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10x86: Do not disable preemption in int3 on 32bitSteven Rostedt
Preemption must be disabled before enabling interrupts in do_trap on x86_64 because the stack in use for int3 and debug is a per CPU stack set by th IST. But 32bit does not have an IST and the stack still belongs to the current task and there is no problem in scheduling out the task. Keep preemption enabled on X86_32 when enabling interrupts for do_trap(). The name of the function is changed from preempt_conditional_sti/cli() to conditional_sti/cli_ist(), to annotate that this function is used when the stack is on the IST. Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10x86: Do not unmask io_apic when interrupt is in progressIngo Molnar
With threaded interrupts we might see an interrupt in progress on migration. Do not unmask it when this is the case. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10mm: pagefault_disabled()Peter Zijlstra
Wrap the test for pagefault_disabled() into a helper, this allows us to remove the need for current->pagefault_disabled on !-rt kernels. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-3yy517m8zsi9fpsf14xfaqkw@git.kernel.org
2014-04-10mm: Fixup all fault handlers to check current->pagefault_disableThomas Gleixner
Necessary for decoupling pagefault disable from preempt count. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10ARM: AT91: PIT: Remove irq handler when clock event is unusedBenedikt Spranger
Setup and remove the interrupt handler in clock event mode selection. This avoids calling the (shared) interrupt handler when the device is not used. Signed-off-by: Benedikt Spranger <b.spranger@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [bigeasy: redo the patch with NR_IRQS_LEGACY which is probably required since commit 8fe82a55 ("ARM: at91: sparse irq support") which is included since v3.6. Patch based on what Sami Pietikäinen <Sami.Pietikainen@wapice.com> suggested]. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2014-04-10signal/x86: Delay calling signals in atomicOleg Nesterov
On x86_64 we must disable preemption before we enable interrupts for stack faults, int3 and debugging, because the current task is using a per CPU debug stack defined by the IST. If we schedule out, another task can come in and use the same stack and cause the stack to be corrupted and crash the kernel on return. When CONFIG_PREEMPT_RT_FULL is enabled, spin_locks become mutexes, and one of these is the spin lock used in signal handling. Some of the debug code (int3) causes do_trap() to send a signal. This function calls a spin lock that has been converted to a mutex and has the possibility to sleep. If this happens, the above issues with the corrupted stack is possible. Instead of calling the signal right away, for PREEMPT_RT and x86_64, the signal information is stored on the stacks task_struct and TIF_NOTIFY_RESUME is set. Then on exit of the trap, the signal resume code will send the signal when preemption is enabled. [ rostedt: Switched from #ifdef CONFIG_PREEMPT_RT_FULL to ARCH_RT_DELAYS_SIGNAL_SEND and added comments to the code. ] Cc: stable-rt@vger.kernel.org Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10preempt-rt: Convert arm boot_lock to rawFrank Rowand
The arm boot_lock is used by the secondary processor startup code. The locking task is the idle thread, which has idle->sched_class == &idle_sched_class. idle_sched_class->enqueue_task == NULL, so if the idle task blocks on the lock, the attempt to wake it when the lock becomes available will fail: try_to_wake_up() ... activate_task() enqueue_task() p->sched_class->enqueue_task(rq, p, flags) Fix by converting boot_lock to a raw spin lock. Signed-off-by: Frank Rowand <frank.rowand@am.sony.com> Link: http://lkml.kernel.org/r/4E77B952.3010606@am.sony.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10mips-enable-interrupts-in-signal.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10Kind of revert "powerpc: 52xx: provide a default in mpc52xx_irqhost_map()"Wolfram Sang
This more or less reverts commit 6391f697d4892a6f233501beea553e13f7745a23. Instead of adding an unneeded 'default', mark the variable to prevent the false positive 'uninitialized var'. The other change (fixing the printout) needs revert, too. We want to know WHICH critical irq failed, not which level it had. Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Anatolij Gustschin <agust@denx.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2014-04-10sparc64: convert ctx_alloc_lock raw_spinlock_tAllen Pais
Signed-off-by: Allen Pais <allen.pais@oracle.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2014-04-10sparc64: convert spinlock_t to raw_spinlock_t in mmu_context_tAllen Pais
Issue debugged by Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Allen Pais <allen.pais@oracle.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2014-04-10sparc64: use generic rwsem spinlocks rtAllen Pais
Signed-off-by: Allen Pais <allen.pais@oracle.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2014-04-10sparc: provide EARLY_PRINTK for SPARCKirill Tkhai
sparc does not have CONFIG_EARLY_PRINTK option. So early-printk-consolidate.patch breaks compilation: arch/sparc/built-in.o: In function `setup_arch': (.init.text+0x15e4): undefined reference to `early_console' arch/sparc/built-in.o: In function `setup_arch': (.init.text+0x15ec): undefined reference to `early_console' The below addition fixes that. Signed-off-by: Kirill Tkhai <tkhai@yandex.ru> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2014-04-10early-printk-consolidate.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-03-24arm64: mm: Add double logical invert to pte accessorsSteve Capper
commit 84fe6826c28f69d8708bd575faed7f75e6b6f57f upstream. Page table entries on ARM64 are 64 bits, and some pte functions such as pte_dirty return a bitwise-and of a flag with the pte value. If the flag to be tested resides in the upper 32 bits of the pte, then we run into the danger of the result being dropped if downcast. For example: gather_stats(page, md, pte_dirty(*pte), 1); where pte_dirty(*pte) is downcast to an int. This patch adds a double logical invert to all the pte_ accessors to ensure predictable downcasting. Signed-off-by: Steve Capper <steve.capper@linaro.org> [steve.capper@linaro.org: rebased patch to leave pte_write alone to allow for merge with 3.13 stable] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-24MIPS: include linux/types.hQais Yousef
commit 87c99203fea897fbdd84b681ad9fced2517dcf98 upstream. The file uses u16 type but doesn't include its definition explicitly I was getting this error when including this header in my driver: arch/mips/include/asm/mipsregs.h:644:33: error: unknown type name ‘u16’ Signed-off-by: Qais Yousef <qais.yousef@imgtec.com> Reviewed-by: Steven J. Hill <Steven.Hill@imgtec.com> Acked-by: David Daney <david.daney@cavium.com> Signed-off-by: John Crispin <blogic@openwrt.org> Patchwork: http://patchwork.linux-mips.org/patch/6212/ Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-24x86, fpu: Check tsk_used_math() in kernel_fpu_end() for eager FPUSuresh Siddha
commit 731bd6a93a6e9172094a2322bd0ee964bb1f4d63 upstream. For non-eager fpu mode, thread's fpu state is allocated during the first fpu usage (in the context of device not available exception). This (math_state_restore()) can be a blocking call and hence we enable interrupts (which were originally disabled when the exception happened), allocate memory and disable interrupts etc. But the eager-fpu mode, call's the same math_state_restore() from kernel_fpu_end(). The assumption being that tsk_used_math() is always set for the eager-fpu mode and thus avoid the code path of enabling interrupts, allocating fpu state using blocking call and disable interrupts etc. But the below issue was noticed by Maarten Baert, Nate Eldredge and few others: If a user process dumps core on an ecrypt fs while aesni-intel is loaded, we get a BUG() in __find_get_block() complaining that it was called with interrupts disabled; then all further accesses to our ecrypt fs hang and we have to reboot. The aesni-intel code (encrypting the core file that we are writing) needs the FPU and quite properly wraps its code in kernel_fpu_{begin,end}(), the latter of which calls math_state_restore(). So after kernel_fpu_end(), interrupts may be disabled, which nobody seems to expect, and they stay that way until we eventually get to __find_get_block() which barfs. For eager fpu, most the time, tsk_used_math() is true. At few instances during thread exit, signal return handling etc, tsk_used_math() might be false. In kernel_fpu_end(), for eager-fpu, call math_state_restore() only if tsk_used_math() is set. Otherwise, don't bother. Kernel code path which cleared tsk_used_math() knows what needs to be done with the fpu state. Reported-by: Maarten Baert <maarten-baert@hotmail.com> Reported-by: Nate Eldredge <nate@thatsmathematics.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Suresh Siddha <sbsiddha@gmail.com> Link: http://lkml.kernel.org/r/1391410583.3801.6.camel@europa Cc: George Spelvin <linux@horizon.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-24KVM: SVM: fix cr8 intercept windowRadim Krčmář
commit 596f3142d2b7be307a1652d59e7b93adab918437 upstream. We always disable cr8 intercept in its handler, but only re-enable it if handling KVM_REQ_EVENT, so there can be a window where we do not intercept cr8 writes, which allows an interrupt to disrupt a higher priority task. Fix this by disabling intercepts in the same function that re-enables them when needed. This fixes BSOD in Windows 2008. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-22x86/amd/numa: Fix northbridge quirk to assign correct NUMA nodeDaniel J Blueman
commit 847d7970defb45540735b3fb4e88471c27cacd85 upstream. For systems with multiple servers and routed fabric, all northbridges get assigned to the first server. Fix this by also using the node reported from the PCI bus. For single-fabric systems, the northbriges are on PCI bus 0 by definition, which are on NUMA node 0 by definition, so this is invarient on most systems. Tested on fam10h and fam15h single and multi-fabric systems and candidate for stable. Signed-off-by: Daniel J Blueman <daniel@numascale.com> Acked-by: Steffen Persvold <sp@numascale.com> Acked-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1394710981-3596-1-git-send-email-daniel@numascale.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-22x86: fix compile error due to X86_TRAP_NMI use in asm filesLinus Torvalds
commit b01d4e68933ec23e43b1046fa35d593cefcf37d1 upstream. It's an enum, not a #define, you can't use it in asm files. Introduced in commit 5fa10196bdb5 ("x86: Ignore NMIs that come in during early boot"), and sadly I didn't compile-test things like I should have before pushing out. My weak excuse is that the x86 tree generally doesn't introduce stupid things like this (and the ARM pull afterwards doesn't cause me to do a compile-test either, since I don't cross-compile). Cc: Don Zickus <dzickus@redhat.com> Cc: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-22x86: Ignore NMIs that come in during early bootH. Peter Anvin
commit 5fa10196bdb5f190f595ebd048490ee52dddea0f upstream. Don Zickus reports: A customer generated an external NMI using their iLO to test kdump worked. Unfortunately, the machine hung. Disabling the nmi_watchdog made things work. I speculated the external NMI fired, caused the machine to panic (as expected) and the perf NMI from the watchdog came in and was latched. My guess was this somehow caused the hang. ---- It appears that the latched NMI stays latched until the early page table generation on 64 bits, which causes exceptions to happen which end in IRET, which re-enable NMI. Therefore, ignore NMIs that come in during early execution, until we have proper exception handling. Reported-and-tested-by: Don Zickus <dzickus@redhat.com> Link: http://lkml.kernel.org/r/1394221143-29713-1-git-send-email-dzickus@redhat.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-22ARM: 7991/1: sa1100: fix compile problem on CollieLinus Walleij
commit 052450fdc55894a39fbae93d9bbe43947956f663 upstream. Due to a problem in the MFD Kconfig it was not possible to compile the UCB battery driver for the Collie SA1100 system, in turn making it impossible to compile in the battery driver. (See patch "mfd: include all drivers in subsystem menu".) After fixing the MFD Kconfig (separate patch) a compile error appears in the Collie battery driver due to the <mach/collie.h> implicitly requiring <mach/hardware.h> through <linux/gpio.h> via <mach/gpio.h> prior to commit 40ca061b "ARM: 7841/1: sa1100: remove complex GPIO interface". Fix this up by including the required header into <mach/collie.h>. Cc: Andrea Adami <andrea.adami@gmail.com> Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-22powerpc: Align p_dyn, p_rela and p_st symbolsAnton Blanchard
commit a5b2cf5b1af424ee3dd9e3ce6d5cea18cb927e67 upstream. The 64bit relocation code places a few symbols in the text segment. These symbols are only 4 byte aligned where they need to be 8 byte aligned. Add an explicit alignment. Signed-off-by: Anton Blanchard <anton@samba.org> Tested-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-22powerpc/tm: Fix crash when forking inside a transactionMichael Neuling
commit 621b5060e823301d0cba4cb52a7ee3491922d291 upstream. When we fork/clone we currently don't copy any of the TM state to the new thread. This results in a TM bad thing (program check) when the new process is switched in as the kernel does a tmrechkpt with TEXASR FS not set. Also, since R1 is from userspace, we trigger the bad kernel stack pointer detection. So we end up with something like this: Bad kernel stack pointer 0 at c0000000000404fc cpu 0x2: Vector: 700 (Program Check) at [c00000003ffefd40] pc: c0000000000404fc: restore_gprs+0xc0/0x148 lr: 0000000000000000 sp: 0 msr: 9000000100201030 current = 0xc000001dd1417c30 paca = 0xc00000000fe00800 softe: 0 irq_happened: 0x01 pid = 0, comm = swapper/2 WARNING: exception is not recoverable, can't continue The below fixes this by flushing the TM state before we copy the task_struct to the clone. To do this we go through the tmreclaim patch, which removes the checkpointed registers from the CPU and transitions the CPU out of TM suspend mode. Hence we need to call tmrechkpt after to restore the checkpointed state and the TM mode for the current task. To make this fail from userspace is simply: tbegin li r0, 2 sc <boom> Kudos to Adhemerval Zanella Neto for finding this. Signed-off-by: Michael Neuling <mikey@neuling.org> cc: Adhemerval Zanella Neto <azanella@br.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-12x86/dumpstack: Fix printk_address for direct addressesJiri Slaby
commit 5f01c98859073cb512b01d4fad74b5f4e047be0b upstream. Consider a kernel crash in a module, simulated the following way: static int my_init(void) { char *map = (void *)0x5; *map = 3; return 0; } module_init(my_init); When we turn off FRAME_POINTERs, the very first instruction in that function causes a BUG. The problem is that we print IP in the BUG report using %pB (from printk_address). And %pB decrements the pointer by one to fix printing addresses of functions with tail calls. This was added in commit 71f9e59800e5ad4 ("x86, dumpstack: Use %pB format specifier for stack trace") to fix the call stack printouts. So instead of correct output: BUG: unable to handle kernel NULL pointer dereference at 0000000000000005 IP: [<ffffffffa01ac000>] my_init+0x0/0x10 [pb173] We get: BUG: unable to handle kernel NULL pointer dereference at 0000000000000005 IP: [<ffffffffa0152000>] 0xffffffffa0151fff To fix that, we use %pS only for stack addresses printouts (via newly added printk_stack_address) and %pB for regs->ip (via printk_address). I.e. we revert to the old behaviour for all except call stacks. And since from all those reliable is 1, we remove that parameter from printk_address. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: joe@perches.com Cc: jirislaby@gmail.com Link: http://lkml.kernel.org/r/1382706418-8435-1-git-send-email-jslaby@suse.cz Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-12powerpc: Fix fatal SLB miss when restoring PPRBenjamin Herrenschmidt
commit 0c4888ef1d8a8b82c29075ce7e257ff795af15c7 upstream. When restoring the PPR value, we incorrectly access the thread structure at a time where MSR:RI is clear, which means we cannot recover from nested faults. However the thread structure isn't covered by the "bolted" SLB entries and thus accessing can fault. This fixes it by splitting the code so that the PPR value is loaded into a GPR before MSR:RI is cleared. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-12s390/appldata: restore missing init_virt_timer()Gerald Schaefer
commit b7c5b1aa2836c933ab03f90391619ebdc9112e46 upstream. Commit 27f6b416 "s390/vtimer: rework virtual timer interface" removed the call to init_virt_timer() by mistake, which is added again by this patch. Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-12s390,time: revert direct ktime path for s390 clockevent deviceMartin Schwidefsky
commit 8adbf78ec4839c1dc4ff20c9a1f332a7bc99e6e6 upstream. Git commit 4f37a68cdaf6dea833cfdded2a3e0c47c0f006da "s390: Use direct ktime path for s390 clockevent device" makes use of the CLOCK_EVT_FEAT_KTIME clockevent option to avoid the delta calculation with ktime_get() in clockevents_program_event and the get_tod_clock() in s390_next_event. This is based on the assumption that the difference between the internal ktime and the hardware clock is reflected in the wall_to_monotonic delta. But this is not true, the ntp corrections are applied via changes to the tk->mult multiplier and this is not reflected in wall_to_monotonic. In theory this could be solved by using the raw monotonic clock but it is simpler to switch back to the standard clock delta calculation. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-12s390/time,vdso: convert to the new update_vsyscall interfaceMartin Schwidefsky
commit 79c74ecbebf76732f91b82a62ce7fc8a88326962 upstream. Switch to the improved update_vsyscall interface that provides sub-nanosecond precision for gettimeofday and clock_gettime. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-05ARM64: unwind: Fix PC calculationOlof Johansson
commit e306dfd06fcb44d21c80acb8e5a88d55f3d1cf63 upstream. The frame PC value in the unwind code used to just take the saved LR value and use that. That's incorrect as a stack trace, since it shows the return path stack, not the call path stack. In particular, it shows faulty information in case the bl is done as the very last instruction of one label, since the return point will be in the next label. That can easily be seen with tail calls to panic(), which is marked __noreturn and thus doesn't have anything useful after it. Easiest here is to just correct the unwind code and do a -4, to get the actual call site for the backtrace instead of the return site. Signed-off-by: Olof Johansson <olof@lixom.net> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-05xtensa: introduce spill_registers_kernel macroMax Filippov
commit e2fd1374c705abe4661df3fb6fadb3879c7c1846 upstream. Most in-kernel users want registers spilled on the kernel stack and don't require PS.EXCM to be set. That means that they don't need fixup routine and could reuse regular window overflow mechanism for that, which makes spill routine very simple. Suggested-by: Chris Zankel <chris@zankel.net> Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-05xtensa: save current register frame in fast_syscall_spill_registers_fixupMax Filippov
commit 3251f1e27a5a17f0efd436cfd1e7b9896cfab0a0 upstream. We need it saved because it contains a3 where we track which register windows we still need to spill, and fixup handler may call C exception handlers. Also fix comments. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-05perf/x86: Fix event schedulingPeter Zijlstra
commit 26e61e8939b1fe8729572dabe9a9e97d930dd4f6 upstream. Vince "Super Tester" Weaver reported a new round of syscall fuzzing (Trinity) failures, with perf WARN_ON()s triggering. He also provided traces of the failures. This is I think the relevant bit: > pec_1076_warn-2804 [000] d... 147.926153: x86_pmu_disable: x86_pmu_disable > pec_1076_warn-2804 [000] d... 147.926153: x86_pmu_state: Events: { > pec_1076_warn-2804 [000] d... 147.926156: x86_pmu_state: 0: state: .R config: ffffffffffffffff ( (null)) > pec_1076_warn-2804 [000] d... 147.926158: x86_pmu_state: 33: state: AR config: 0 (ffff88011ac99800) > pec_1076_warn-2804 [000] d... 147.926159: x86_pmu_state: } > pec_1076_warn-2804 [000] d... 147.926160: x86_pmu_state: n_events: 1, n_added: 0, n_txn: 1 > pec_1076_warn-2804 [000] d... 147.926161: x86_pmu_state: Assignment: { > pec_1076_warn-2804 [000] d... 147.926162: x86_pmu_state: 0->33 tag: 1 config: 0 (ffff88011ac99800) > pec_1076_warn-2804 [000] d... 147.926163: x86_pmu_state: } > pec_1076_warn-2804 [000] d... 147.926166: collect_events: Adding event: 1 (ffff880119ec8800) So we add the insn:p event (fd[23]). At this point we should have: n_events = 2, n_added = 1, n_txn = 1 > pec_1076_warn-2804 [000] d... 147.926170: collect_events: Adding event: 0 (ffff8800c9e01800) > pec_1076_warn-2804 [000] d... 147.926172: collect_events: Adding event: 4 (ffff8800cbab2c00) We try and add the {BP,cycles,br_insn} group (fd[3], fd[4], fd[15]). These events are 0:cycles and 4:br_insn, the BP event isn't x86_pmu so that's not visible. group_sched_in() pmu->start_txn() /* nop - BP pmu */ event_sched_in() event->pmu->add() So here we should end up with: 0: n_events = 3, n_added = 2, n_txn = 2 4: n_events = 4, n_added = 3, n_txn = 3 But seeing the below state on x86_pmu_enable(), the must have failed, because the 0 and 4 events aren't there anymore. Looking at group_sched_in(), since the BP is the leader, its event_sched_in() must have succeeded, for otherwise we would not have seen the sibling adds. But since neither 0 or 4 are in the below state; their event_sched_in() must have failed; but I don't see why, the complete state: 0,0,1:p,4 fits perfectly fine on a core2. However, since we try and schedule 4 it means the 0 event must have succeeded! Therefore the 4 event must have failed, its failure will have put group_sched_in() into the fail path, which will call: event_sched_out() event->pmu->del() on 0 and the BP event. Now x86_pmu_del() will reduce n_events; but it will not reduce n_added; giving what we see below: n_event = 2, n_added = 2, n_txn = 2 > pec_1076_warn-2804 [000] d... 147.926177: x86_pmu_enable: x86_pmu_enable > pec_1076_warn-2804 [000] d... 147.926177: x86_pmu_state: Events: { > pec_1076_warn-2804 [000] d... 147.926179: x86_pmu_state: 0: state: .R config: ffffffffffffffff ( (null)) > pec_1076_warn-2804 [000] d... 147.926181: x86_pmu_state: 33: state: AR config: 0 (ffff88011ac99800) > pec_1076_warn-2804 [000] d... 147.926182: x86_pmu_state: } > pec_1076_warn-2804 [000] d... 147.926184: x86_pmu_state: n_events: 2, n_added: 2, n_txn: 2 > pec_1076_warn-2804 [000] d... 147.926184: x86_pmu_state: Assignment: { > pec_1076_warn-2804 [000] d... 147.926186: x86_pmu_state: 0->33 tag: 1 config: 0 (ffff88011ac99800) > pec_1076_warn-2804 [000] d... 147.926188: x86_pmu_state: 1->0 tag: 1 config: 1 (ffff880119ec8800) > pec_1076_warn-2804 [000] d... 147.926188: x86_pmu_state: } > pec_1076_warn-2804 [000] d... 147.926190: x86_pmu_enable: S0: hwc->idx: 33, hwc->last_cpu: 0, hwc->last_tag: 1 hwc->state: 0 So the problem is that x86_pmu_del(), when called from a group_sched_in() that fails (for whatever reason), and without x86_pmu TXN support (because the leader is !x86_pmu), will corrupt the n_added state. Reported-and-Tested-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Stephane Eranian <eranian@google.com> Cc: Dave Jones <davej@redhat.com> Link: http://lkml.kernel.org/r/20140221150312.GF3104@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-05x86: dma-mapping: fix GFP_ATOMIC macro usageMarek Szyprowski
commit c091c71ad2218fc50a07b3d1dab85783f3b77efd upstream. GFP_ATOMIC is not a single gfp flag, but a macro which expands to the other flags, where meaningful is the LACK of __GFP_WAIT flag. To check if caller wants to perform an atomic allocation, the code must test for a lack of the __GFP_WAIT flag. This patch fixes the issue introduced in v3.5-rc1. Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-05powerpc/crashdump : Fix page frame number check in copy_oldmem_pageLaurent Dufour
commit f5295bd8ea8a65dc5eac608b151386314cb978f1 upstream. In copy_oldmem_page, the current check using max_pfn and min_low_pfn to decide if the page is backed or not, is not valid when the memory layout is not continuous. This happens when running as a QEMU/KVM guest, where RTAS is mapped higher in the memory. In that case max_pfn points to the end of RTAS, and a hole between the end of the kdump kernel and RTAS is not backed by PTEs. As a consequence, the kdump kernel is crashing in copy_oldmem_page when accessing in a direct way the pages in that hole. This fix relies on the memblock's service memblock_is_region_memory to check if the read page is part or not of the directly accessible memory. Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Tested-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-05powerpc/le: Ensure that the 'stop-self' RTAS token is handled correctlyTony Breeds
commit 41dd03a94c7d408d2ef32530545097f7d1befe5c upstream. Currently we're storing a host endian RTAS token in rtas_stop_self_args.token. We then pass that directly to rtas. This is fine on big endian however on little endian the token is not what we expect. This will typically result in hitting: panic("Alas, I survived.\n"); To fix this we always use the stop-self token in host order and always convert it to be32 before passing this to rtas. Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-05powerpc: Increase stack redzone for 64-bit userspace to 512 bytesPaul Mackerras
commit 573ebfa6601fa58b439e7f15828762839ccd306a upstream. The new ELFv2 little-endian ABI increases the stack redzone -- the area below the stack pointer that can be used for storing data -- from 288 bytes to 512 bytes. This means that we need to allow more space on the user stack when delivering a signal to a 64-bit process. To make the code a bit clearer, we define new USER_REDZONE_SIZE and KERNEL_REDZONE_SIZE symbols in ptrace.h. For now, we leave the kernel redzone size at 288 bytes, since increasing it to 512 bytes would increase the size of interrupt stack frames correspondingly. Gcc currently only makes use of 288 bytes of redzone even when compiling for the new little-endian ABI, and the kernel cannot currently be compiled with the new ABI anyway. In the future, hopefully gcc will provide an option to control the amount of redzone used, and then we could reduce it even more. This also changes the code in arch_compat_alloc_user_space() to preserve the expanded redzone. It is not clear why this function would ever be used on a 64-bit process, though. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-05kvm: x86: fix emulator buffer overflow (CVE-2014-0049)Andrew Honig
commit a08d3b3b99efd509133946056531cdf8f3a0c09b upstream. The problem occurs when the guest performs a pusha with the stack address pointing to an mmio address (or an invalid guest physical address) to start with, but then extending into an ordinary guest physical address. When doing repeated emulated pushes emulator_read_write sets mmio_needed to 1 on the first one. On a later push when the stack points to regular memory, mmio_nr_fragments is set to 0, but mmio_is_needed is not set to 0. As a result, KVM exits to userspace, and then returns to complete_emulated_mmio. In complete_emulated_mmio vcpu->mmio_cur_fragment is incremented. The termination condition of vcpu->mmio_cur_fragment == vcpu->mmio_nr_fragments is never achieved. The code bounces back and fourth to userspace incrementing mmio_cur_fragment past it's buffer. If the guest does nothing else it eventually leads to a a crash on a memcpy from invalid memory address. However if a guest code can cause the vm to be destroyed in another vcpu with excellent timing, then kvm_clear_async_pf_completion_queue can be used by the guest to control the data that's pointed to by the call to cancel_work_item, which can be used to gain execution. Fixes: f78146b0f9230765c6315b2e14f56112513389ad Signed-off-by: Andrew Honig <ahonig@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-05avr32: Makefile: add '-D__linux__' flag for gcc-4.4.7 useChen Gang
commit 8d80390cfc9434d5aa4fb9e5f9768a66b30cb8a6 upstream. For avr32 cross compiler, do not define '__linux__' internally, so it will cause issue with allmodconfig. The related error: CC [M] fs/coda/psdev.o In file included from include/linux/coda.h:64, from fs/coda/psdev.c:45: include/uapi/linux/coda.h:221: error: expected specifier-qualifier-list before 'u_quad_t' The related toolchain version (which only download, not re-compile): [root@gchen linux-next]# /upstream/toolchain/download/avr32-gnu-toolchain-linux_x86/bin/avr32-gcc -v Using built-in specs. Target: avr32 Configured with: /data2/home/toolsbuild/jenkins-knuth/workspace/avr32-gnu-toolchain/src/gcc/configure --target=avr32 --host=i686-pc-linux-gnu --build=x86_64-pc-linux-gnu --prefix=/home/toolsbuild/jenkins-knuth/workspace/avr32-gnu-toolchain/avr32-gnu-toolchain-linux_x86 --enable-languages=c,c++ --disable-nls --disable-libssp --disable-libstdcxx-pch --with-dwarf2 --enable-version-specific-runtime-libs --disable-shared --enable-doc --with-mpfr-lib=/home/toolsbuild/jenkins-knuth/workspace/avr32-gnu-toolchain/avr32-gnu-toolchain-linux_x86/lib --with-mpfr-include=/home/toolsbuild/jenkins-knuth/workspace/avr32-gnu-toolchain/avr32-gnu-toolchain-linux_x86/include --with-gmp=/home/toolsbuild/jenkins-knuth/workspace/avr32-gnu-toolchain/avr32-gnu-toolchain-linux_x86 --with-mpc=/home/toolsbuild/jenkins-knuth/workspace/avr32-gnu-toolchain/avr32-gnu-toolchain-linux_x86 --enable-__cxa_atexit --disable-shared --with-newlib --with-pkgversion=AVR_32_bit_GNU_Toolchain_3.4.2_435 --with-bugurl=http://www .atmel.com/avr Thread model: single gcc version 4.4.7 (AVR_32_bit_GNU_Toolchain_3.4.2_435) Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com> Acked-by: Hans-Christian Egtvedt <hegtvedt@cisco.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-05avr32: fix missing module.h causing build failure in mimc200/fram.cPaul Gortmaker
commit 5745d6a41a4f4aec29e2ccd591c6fb09ed73a955 upstream. Causing this: In file included from arch/avr32/boards/mimc200/fram.c:13: include/linux/miscdevice.h:51: error: field 'list' has incomplete type include/linux/miscdevice.h:55: error: expected specifier-qualifier-list before 'mode_t' arch/avr32/boards/mimc200/fram.c:42: error: 'THIS_MODULE' undeclared here (not in a function) Reported-by: Fengguang Wu <fengguang.wu@intel.com> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org> Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-05powerpc/powernv: Rework EEH resetGavin Shan
commit 5b2e198e50f6ba57081586b853163ea1bb95f1a8 upstream. When doing reset in order to recover the affected PE, we issue hot reset on PE primary bus if it's not root bus. Otherwise, we issue hot or fundamental reset on root port or PHB accordingly. For the later case, we didn't cover the situation where PE only includes root port and it potentially causes kernel crash upon EEH error to the PE. The patch reworks the logic of EEH reset to improve the code readability and also avoid the kernel crash. Reported-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com> Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-05powerpc: Set the correct ksp_limit on ppc32 when switching to irq stackKevin Hao
commit 1a18a66446f3f289b05b634f18012424d82aa63a upstream. Guenter Roeck has got the following call trace on a p2020 board: Kernel stack overflow in process eb3e5a00, r1=eb79df90 CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 #4 task: eb3e5a00 ti: c0616000 task.ti: ef440000 NIP: c003a420 LR: c003a410 CTR: c0017518 REGS: eb79dee0 TRAP: 0901 Not tainted (3.13.0-rc8-juniper-00146-g19eca00) MSR: 00029000 <CE,EE,ME> CR: 24008444 XER: 00000000 GPR00: c003a410 eb79df90 eb3e5a00 00000000 eb05d900 00000001 65d87646 00000000 GPR08: 00000000 020b8000 00000000 00000000 44008442 NIP [c003a420] __do_softirq+0x94/0x1ec LR [c003a410] __do_softirq+0x84/0x1ec Call Trace: [eb79df90] [c003a410] __do_softirq+0x84/0x1ec (unreliable) [eb79dfe0] [c003a970] irq_exit+0xbc/0xc8 [eb79dff0] [c000cc1c] call_do_irq+0x24/0x3c [ef441f20] [c00046a8] do_IRQ+0x8c/0xf8 [ef441f40] [c000e7f4] ret_from_except+0x0/0x18 --- Exception: 501 at 0xfcda524 LR = 0x10024900 Instruction dump: 7c781b78 3b40000a 3a73b040 543c0024 3a800000 3b3913a0 7ef5bb78 48201bf9 5463103a 7d3b182e 7e89b92e 7c008146 <3ba00000> 7e7e9b78 48000014 57fff87f Kernel panic - not syncing: kernel stack overflow CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 #4 Call Trace: The reason is that we have used the wrong register to calculate the ksp_limit in commit cbc9565ee826 (powerpc: Remove ksp_limit on ppc64). Just fix it. As suggested by Benjamin Herrenschmidt, also add the C prototype of the function in the comment in order to avoid such kind of errors in the future. Reported-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-05ARM: OMAP2+: gpmc: fix: DT ONENAND child nodes not probed when MTD_ONENAND ↵Pekon Gupta
is built as module commit 980386d2d6d49e0b42f48550853ef1ad6aa5d79a upstream. Fixes: commit 75d3625e0e86b2d8d77b4e9c6f685fd7ea0d5a96 ARM: OMAP2+: gpmc: add DT bindings for OneNAND OMAP SoC(s) depend on GPMC controller driver to parse GPMC DT child nodes and register them platform_device for ONENAND driver to probe later. However this does not happen if generic MTD_ONENAND framework is built as module (CONFIG_MTD_ONENAND=m). Therefore, when MTD/ONENAND and MTD/ONENAND/OMAP2 modules are loaded, they are unable to find any matching platform_device and remain un-binded. This causes on board ONENAND flash to remain un-detected. This patch causes GPMC controller to parse DT nodes when CONFIG_MTD_ONENAND=y || CONFIG_MTD_ONENAND=m Signed-off-by: Pekon Gupta <pekon@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-05ARM: OMAP2+: gpmc: fix: DT NAND child nodes not probed when MTD_NAND is ↵Pekon Gupta
built as module commit 6b187b21c92b6e2c7e8ef0b450181c37a3f31681 upstream. Fixes: commit bc6b1e7b86f5d8e4a6fc1c0189e64bba4077efe0 ARM: OMAP: gpmc: add DT bindings for GPMC timings and NAND OMAP SoC(s) depend on GPMC controller driver to parse GPMC DT child nodes and register them platform_device for NAND driver to probe later. However this does not happen if generic MTD_NAND framework is built as module (CONFIG_MTD_NAND=m). Therefore, when MTD/NAND and MTD/NAND/OMAP2 modules are loaded, they are unable to find any matching platform_device and remain un-binded. This causes on board NAND flash to remain un-detected. This patch causes GPMC controller to parse DT nodes when CONFIG_MTD_NAND=y || CONFIG_MTD_NAND=m Signed-off-by: Pekon Gupta <pekon@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-05ARM: 7957/1: add DSB after icache flush in __flush_icache_all()Vinayak Kale
commit 39544ac9df20f73e49fc6b9ac19ff533388c82c0 upstream. Add DSB after icache flush to complete the cache maintenance operation. Signed-off-by: Vinayak Kale <vkale@apm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-05ARM: 7955/1: spinlock: ensure we have a compiler barrier before sevWill Deacon
commit 7c8746a9eb287642deaad0e7c2cdf482dce5e4be upstream. When unlocking a spinlock, we require the following, strictly ordered sequence of events: <barrier> /* dmb */ <unlock> <barrier> /* dsb */ <sev> Whilst the code does indeed reflect this in terms of the architecture, the final <barrier> + <sev> have been contracted into a single inline asm without a "memory" clobber, therefore the compiler is at liberty to reorder the unlock to the end of the above sequence. In such a case, a waiting CPU may be woken up before the lock has been unlocked, leading to extremely poor performance. This patch reworks the dsb_sev() function to make use of the dsb() macro and ensure ordering against the unlock. Reported-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-05ARM: 7953/1: mm: ensure TLB invalidation is complete before enabling MMUWill Deacon
commit bae0ca2bc550d1ec6a118fb8f2696f18c4da3d8e upstream. During __v{6,7}_setup, we invalidate the TLBs since we are about to enable the MMU on return to head.S. Unfortunately, without a subsequent dsb instruction, the invalidation is not guaranteed to have completed by the time we write to the sctlr, potentially exposing us to junk/stale translations cached in the TLB. This patch reworks the init functions so that the dsb used to ensure completion of cache/predictor maintenance is also used to ensure completion of the TLB invalidation. Reported-by: Albin Tonnerre <Albin.Tonnerre@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-05ARM: dma-mapping: fix GFP_ATOMIC macro usageMarek Szyprowski
commit 10c8562f932d89c030083e15f9279971ed637136 upstream. GFP_ATOMIC is not a single gfp flag, but a macro which expands to the other flags and LACK of __GFP_WAIT flag. To check if caller wanted to perform an atomic allocation, the code must test __GFP_WAIT flag presence. This patch fixes the issue introduced in v3.6-rc5 Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>