summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2014-03-05perf/x86: Fix event schedulingPeter Zijlstra
commit 26e61e8939b1fe8729572dabe9a9e97d930dd4f6 upstream. Vince "Super Tester" Weaver reported a new round of syscall fuzzing (Trinity) failures, with perf WARN_ON()s triggering. He also provided traces of the failures. This is I think the relevant bit: > pec_1076_warn-2804 [000] d... 147.926153: x86_pmu_disable: x86_pmu_disable > pec_1076_warn-2804 [000] d... 147.926153: x86_pmu_state: Events: { > pec_1076_warn-2804 [000] d... 147.926156: x86_pmu_state: 0: state: .R config: ffffffffffffffff ( (null)) > pec_1076_warn-2804 [000] d... 147.926158: x86_pmu_state: 33: state: AR config: 0 (ffff88011ac99800) > pec_1076_warn-2804 [000] d... 147.926159: x86_pmu_state: } > pec_1076_warn-2804 [000] d... 147.926160: x86_pmu_state: n_events: 1, n_added: 0, n_txn: 1 > pec_1076_warn-2804 [000] d... 147.926161: x86_pmu_state: Assignment: { > pec_1076_warn-2804 [000] d... 147.926162: x86_pmu_state: 0->33 tag: 1 config: 0 (ffff88011ac99800) > pec_1076_warn-2804 [000] d... 147.926163: x86_pmu_state: } > pec_1076_warn-2804 [000] d... 147.926166: collect_events: Adding event: 1 (ffff880119ec8800) So we add the insn:p event (fd[23]). At this point we should have: n_events = 2, n_added = 1, n_txn = 1 > pec_1076_warn-2804 [000] d... 147.926170: collect_events: Adding event: 0 (ffff8800c9e01800) > pec_1076_warn-2804 [000] d... 147.926172: collect_events: Adding event: 4 (ffff8800cbab2c00) We try and add the {BP,cycles,br_insn} group (fd[3], fd[4], fd[15]). These events are 0:cycles and 4:br_insn, the BP event isn't x86_pmu so that's not visible. group_sched_in() pmu->start_txn() /* nop - BP pmu */ event_sched_in() event->pmu->add() So here we should end up with: 0: n_events = 3, n_added = 2, n_txn = 2 4: n_events = 4, n_added = 3, n_txn = 3 But seeing the below state on x86_pmu_enable(), the must have failed, because the 0 and 4 events aren't there anymore. Looking at group_sched_in(), since the BP is the leader, its event_sched_in() must have succeeded, for otherwise we would not have seen the sibling adds. But since neither 0 or 4 are in the below state; their event_sched_in() must have failed; but I don't see why, the complete state: 0,0,1:p,4 fits perfectly fine on a core2. However, since we try and schedule 4 it means the 0 event must have succeeded! Therefore the 4 event must have failed, its failure will have put group_sched_in() into the fail path, which will call: event_sched_out() event->pmu->del() on 0 and the BP event. Now x86_pmu_del() will reduce n_events; but it will not reduce n_added; giving what we see below: n_event = 2, n_added = 2, n_txn = 2 > pec_1076_warn-2804 [000] d... 147.926177: x86_pmu_enable: x86_pmu_enable > pec_1076_warn-2804 [000] d... 147.926177: x86_pmu_state: Events: { > pec_1076_warn-2804 [000] d... 147.926179: x86_pmu_state: 0: state: .R config: ffffffffffffffff ( (null)) > pec_1076_warn-2804 [000] d... 147.926181: x86_pmu_state: 33: state: AR config: 0 (ffff88011ac99800) > pec_1076_warn-2804 [000] d... 147.926182: x86_pmu_state: } > pec_1076_warn-2804 [000] d... 147.926184: x86_pmu_state: n_events: 2, n_added: 2, n_txn: 2 > pec_1076_warn-2804 [000] d... 147.926184: x86_pmu_state: Assignment: { > pec_1076_warn-2804 [000] d... 147.926186: x86_pmu_state: 0->33 tag: 1 config: 0 (ffff88011ac99800) > pec_1076_warn-2804 [000] d... 147.926188: x86_pmu_state: 1->0 tag: 1 config: 1 (ffff880119ec8800) > pec_1076_warn-2804 [000] d... 147.926188: x86_pmu_state: } > pec_1076_warn-2804 [000] d... 147.926190: x86_pmu_enable: S0: hwc->idx: 33, hwc->last_cpu: 0, hwc->last_tag: 1 hwc->state: 0 So the problem is that x86_pmu_del(), when called from a group_sched_in() that fails (for whatever reason), and without x86_pmu TXN support (because the leader is !x86_pmu), will corrupt the n_added state. Reported-and-Tested-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Stephane Eranian <eranian@google.com> Cc: Dave Jones <davej@redhat.com> Link: http://lkml.kernel.org/r/20140221150312.GF3104@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-05x86: dma-mapping: fix GFP_ATOMIC macro usageMarek Szyprowski
commit c091c71ad2218fc50a07b3d1dab85783f3b77efd upstream. GFP_ATOMIC is not a single gfp flag, but a macro which expands to the other flags, where meaningful is the LACK of __GFP_WAIT flag. To check if caller wants to perform an atomic allocation, the code must test for a lack of the __GFP_WAIT flag. This patch fixes the issue introduced in v3.5-rc1. Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-05powerpc/crashdump : Fix page frame number check in copy_oldmem_pageLaurent Dufour
commit f5295bd8ea8a65dc5eac608b151386314cb978f1 upstream. In copy_oldmem_page, the current check using max_pfn and min_low_pfn to decide if the page is backed or not, is not valid when the memory layout is not continuous. This happens when running as a QEMU/KVM guest, where RTAS is mapped higher in the memory. In that case max_pfn points to the end of RTAS, and a hole between the end of the kdump kernel and RTAS is not backed by PTEs. As a consequence, the kdump kernel is crashing in copy_oldmem_page when accessing in a direct way the pages in that hole. This fix relies on the memblock's service memblock_is_region_memory to check if the read page is part or not of the directly accessible memory. Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Tested-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-05powerpc/le: Ensure that the 'stop-self' RTAS token is handled correctlyTony Breeds
commit 41dd03a94c7d408d2ef32530545097f7d1befe5c upstream. Currently we're storing a host endian RTAS token in rtas_stop_self_args.token. We then pass that directly to rtas. This is fine on big endian however on little endian the token is not what we expect. This will typically result in hitting: panic("Alas, I survived.\n"); To fix this we always use the stop-self token in host order and always convert it to be32 before passing this to rtas. Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-05powerpc: Increase stack redzone for 64-bit userspace to 512 bytesPaul Mackerras
commit 573ebfa6601fa58b439e7f15828762839ccd306a upstream. The new ELFv2 little-endian ABI increases the stack redzone -- the area below the stack pointer that can be used for storing data -- from 288 bytes to 512 bytes. This means that we need to allow more space on the user stack when delivering a signal to a 64-bit process. To make the code a bit clearer, we define new USER_REDZONE_SIZE and KERNEL_REDZONE_SIZE symbols in ptrace.h. For now, we leave the kernel redzone size at 288 bytes, since increasing it to 512 bytes would increase the size of interrupt stack frames correspondingly. Gcc currently only makes use of 288 bytes of redzone even when compiling for the new little-endian ABI, and the kernel cannot currently be compiled with the new ABI anyway. In the future, hopefully gcc will provide an option to control the amount of redzone used, and then we could reduce it even more. This also changes the code in arch_compat_alloc_user_space() to preserve the expanded redzone. It is not clear why this function would ever be used on a 64-bit process, though. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-05kvm: x86: fix emulator buffer overflow (CVE-2014-0049)Andrew Honig
commit a08d3b3b99efd509133946056531cdf8f3a0c09b upstream. The problem occurs when the guest performs a pusha with the stack address pointing to an mmio address (or an invalid guest physical address) to start with, but then extending into an ordinary guest physical address. When doing repeated emulated pushes emulator_read_write sets mmio_needed to 1 on the first one. On a later push when the stack points to regular memory, mmio_nr_fragments is set to 0, but mmio_is_needed is not set to 0. As a result, KVM exits to userspace, and then returns to complete_emulated_mmio. In complete_emulated_mmio vcpu->mmio_cur_fragment is incremented. The termination condition of vcpu->mmio_cur_fragment == vcpu->mmio_nr_fragments is never achieved. The code bounces back and fourth to userspace incrementing mmio_cur_fragment past it's buffer. If the guest does nothing else it eventually leads to a a crash on a memcpy from invalid memory address. However if a guest code can cause the vm to be destroyed in another vcpu with excellent timing, then kvm_clear_async_pf_completion_queue can be used by the guest to control the data that's pointed to by the call to cancel_work_item, which can be used to gain execution. Fixes: f78146b0f9230765c6315b2e14f56112513389ad Signed-off-by: Andrew Honig <ahonig@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-05avr32: Makefile: add '-D__linux__' flag for gcc-4.4.7 useChen Gang
commit 8d80390cfc9434d5aa4fb9e5f9768a66b30cb8a6 upstream. For avr32 cross compiler, do not define '__linux__' internally, so it will cause issue with allmodconfig. The related error: CC [M] fs/coda/psdev.o In file included from include/linux/coda.h:64, from fs/coda/psdev.c:45: include/uapi/linux/coda.h:221: error: expected specifier-qualifier-list before 'u_quad_t' The related toolchain version (which only download, not re-compile): [root@gchen linux-next]# /upstream/toolchain/download/avr32-gnu-toolchain-linux_x86/bin/avr32-gcc -v Using built-in specs. Target: avr32 Configured with: /data2/home/toolsbuild/jenkins-knuth/workspace/avr32-gnu-toolchain/src/gcc/configure --target=avr32 --host=i686-pc-linux-gnu --build=x86_64-pc-linux-gnu --prefix=/home/toolsbuild/jenkins-knuth/workspace/avr32-gnu-toolchain/avr32-gnu-toolchain-linux_x86 --enable-languages=c,c++ --disable-nls --disable-libssp --disable-libstdcxx-pch --with-dwarf2 --enable-version-specific-runtime-libs --disable-shared --enable-doc --with-mpfr-lib=/home/toolsbuild/jenkins-knuth/workspace/avr32-gnu-toolchain/avr32-gnu-toolchain-linux_x86/lib --with-mpfr-include=/home/toolsbuild/jenkins-knuth/workspace/avr32-gnu-toolchain/avr32-gnu-toolchain-linux_x86/include --with-gmp=/home/toolsbuild/jenkins-knuth/workspace/avr32-gnu-toolchain/avr32-gnu-toolchain-linux_x86 --with-mpc=/home/toolsbuild/jenkins-knuth/workspace/avr32-gnu-toolchain/avr32-gnu-toolchain-linux_x86 --enable-__cxa_atexit --disable-shared --with-newlib --with-pkgversion=AVR_32_bit_GNU_Toolchain_3.4.2_435 --with-bugurl=http://www .atmel.com/avr Thread model: single gcc version 4.4.7 (AVR_32_bit_GNU_Toolchain_3.4.2_435) Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com> Acked-by: Hans-Christian Egtvedt <hegtvedt@cisco.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-05avr32: fix missing module.h causing build failure in mimc200/fram.cPaul Gortmaker
commit 5745d6a41a4f4aec29e2ccd591c6fb09ed73a955 upstream. Causing this: In file included from arch/avr32/boards/mimc200/fram.c:13: include/linux/miscdevice.h:51: error: field 'list' has incomplete type include/linux/miscdevice.h:55: error: expected specifier-qualifier-list before 'mode_t' arch/avr32/boards/mimc200/fram.c:42: error: 'THIS_MODULE' undeclared here (not in a function) Reported-by: Fengguang Wu <fengguang.wu@intel.com> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org> Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-05powerpc/powernv: Rework EEH resetGavin Shan
commit 5b2e198e50f6ba57081586b853163ea1bb95f1a8 upstream. When doing reset in order to recover the affected PE, we issue hot reset on PE primary bus if it's not root bus. Otherwise, we issue hot or fundamental reset on root port or PHB accordingly. For the later case, we didn't cover the situation where PE only includes root port and it potentially causes kernel crash upon EEH error to the PE. The patch reworks the logic of EEH reset to improve the code readability and also avoid the kernel crash. Reported-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com> Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-05powerpc: Set the correct ksp_limit on ppc32 when switching to irq stackKevin Hao
commit 1a18a66446f3f289b05b634f18012424d82aa63a upstream. Guenter Roeck has got the following call trace on a p2020 board: Kernel stack overflow in process eb3e5a00, r1=eb79df90 CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 #4 task: eb3e5a00 ti: c0616000 task.ti: ef440000 NIP: c003a420 LR: c003a410 CTR: c0017518 REGS: eb79dee0 TRAP: 0901 Not tainted (3.13.0-rc8-juniper-00146-g19eca00) MSR: 00029000 <CE,EE,ME> CR: 24008444 XER: 00000000 GPR00: c003a410 eb79df90 eb3e5a00 00000000 eb05d900 00000001 65d87646 00000000 GPR08: 00000000 020b8000 00000000 00000000 44008442 NIP [c003a420] __do_softirq+0x94/0x1ec LR [c003a410] __do_softirq+0x84/0x1ec Call Trace: [eb79df90] [c003a410] __do_softirq+0x84/0x1ec (unreliable) [eb79dfe0] [c003a970] irq_exit+0xbc/0xc8 [eb79dff0] [c000cc1c] call_do_irq+0x24/0x3c [ef441f20] [c00046a8] do_IRQ+0x8c/0xf8 [ef441f40] [c000e7f4] ret_from_except+0x0/0x18 --- Exception: 501 at 0xfcda524 LR = 0x10024900 Instruction dump: 7c781b78 3b40000a 3a73b040 543c0024 3a800000 3b3913a0 7ef5bb78 48201bf9 5463103a 7d3b182e 7e89b92e 7c008146 <3ba00000> 7e7e9b78 48000014 57fff87f Kernel panic - not syncing: kernel stack overflow CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 #4 Call Trace: The reason is that we have used the wrong register to calculate the ksp_limit in commit cbc9565ee826 (powerpc: Remove ksp_limit on ppc64). Just fix it. As suggested by Benjamin Herrenschmidt, also add the C prototype of the function in the comment in order to avoid such kind of errors in the future. Reported-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-05ARM: OMAP2+: gpmc: fix: DT ONENAND child nodes not probed when MTD_ONENAND ↵Pekon Gupta
is built as module commit 980386d2d6d49e0b42f48550853ef1ad6aa5d79a upstream. Fixes: commit 75d3625e0e86b2d8d77b4e9c6f685fd7ea0d5a96 ARM: OMAP2+: gpmc: add DT bindings for OneNAND OMAP SoC(s) depend on GPMC controller driver to parse GPMC DT child nodes and register them platform_device for ONENAND driver to probe later. However this does not happen if generic MTD_ONENAND framework is built as module (CONFIG_MTD_ONENAND=m). Therefore, when MTD/ONENAND and MTD/ONENAND/OMAP2 modules are loaded, they are unable to find any matching platform_device and remain un-binded. This causes on board ONENAND flash to remain un-detected. This patch causes GPMC controller to parse DT nodes when CONFIG_MTD_ONENAND=y || CONFIG_MTD_ONENAND=m Signed-off-by: Pekon Gupta <pekon@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-05ARM: OMAP2+: gpmc: fix: DT NAND child nodes not probed when MTD_NAND is ↵Pekon Gupta
built as module commit 6b187b21c92b6e2c7e8ef0b450181c37a3f31681 upstream. Fixes: commit bc6b1e7b86f5d8e4a6fc1c0189e64bba4077efe0 ARM: OMAP: gpmc: add DT bindings for GPMC timings and NAND OMAP SoC(s) depend on GPMC controller driver to parse GPMC DT child nodes and register them platform_device for NAND driver to probe later. However this does not happen if generic MTD_NAND framework is built as module (CONFIG_MTD_NAND=m). Therefore, when MTD/NAND and MTD/NAND/OMAP2 modules are loaded, they are unable to find any matching platform_device and remain un-binded. This causes on board NAND flash to remain un-detected. This patch causes GPMC controller to parse DT nodes when CONFIG_MTD_NAND=y || CONFIG_MTD_NAND=m Signed-off-by: Pekon Gupta <pekon@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-05ARM: 7957/1: add DSB after icache flush in __flush_icache_all()Vinayak Kale
commit 39544ac9df20f73e49fc6b9ac19ff533388c82c0 upstream. Add DSB after icache flush to complete the cache maintenance operation. Signed-off-by: Vinayak Kale <vkale@apm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-05ARM: 7955/1: spinlock: ensure we have a compiler barrier before sevWill Deacon
commit 7c8746a9eb287642deaad0e7c2cdf482dce5e4be upstream. When unlocking a spinlock, we require the following, strictly ordered sequence of events: <barrier> /* dmb */ <unlock> <barrier> /* dsb */ <sev> Whilst the code does indeed reflect this in terms of the architecture, the final <barrier> + <sev> have been contracted into a single inline asm without a "memory" clobber, therefore the compiler is at liberty to reorder the unlock to the end of the above sequence. In such a case, a waiting CPU may be woken up before the lock has been unlocked, leading to extremely poor performance. This patch reworks the dsb_sev() function to make use of the dsb() macro and ensure ordering against the unlock. Reported-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-05ARM: 7953/1: mm: ensure TLB invalidation is complete before enabling MMUWill Deacon
commit bae0ca2bc550d1ec6a118fb8f2696f18c4da3d8e upstream. During __v{6,7}_setup, we invalidate the TLBs since we are about to enable the MMU on return to head.S. Unfortunately, without a subsequent dsb instruction, the invalidation is not guaranteed to have completed by the time we write to the sctlr, potentially exposing us to junk/stale translations cached in the TLB. This patch reworks the init functions so that the dsb used to ensure completion of cache/predictor maintenance is also used to ensure completion of the TLB invalidation. Reported-by: Albin Tonnerre <Albin.Tonnerre@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-05ARM: dma-mapping: fix GFP_ATOMIC macro usageMarek Szyprowski
commit 10c8562f932d89c030083e15f9279971ed637136 upstream. GFP_ATOMIC is not a single gfp flag, but a macro which expands to the other flags and LACK of __GFP_WAIT flag. To check if caller wanted to perform an atomic allocation, the code must test __GFP_WAIT flag presence. This patch fixes the issue introduced in v3.6-rc5 Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-02-22ftrace/x86: Use breakpoints for converting function graph callerSteven Rostedt (Red Hat)
commit 87fbb2ac6073a7039303517546a76074feb14c84 upstream. When the conversion was made to remove stop machine and use the breakpoint logic instead, the modification of the function graph caller is still done directly as though it was being done under stop machine. As it is not converted via stop machine anymore, there is a possibility that the code could be layed across cache lines and if another CPU is accessing that function graph call when it is being updated, it could cause a General Protection Fault. Convert the update of the function graph caller to use the breakpoint method as well. Cc: H. Peter Anvin <hpa@zytor.com> Fixes: 08d636b6d4fb "ftrace/x86: Have arch x86_64 use breakpoints instead of stop machine" Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-22x86, smap: smap_violation() is bogus if CONFIG_X86_SMAP is offH. Peter Anvin
commit 4640c7ee9b8953237d05a61ea3ea93981d1bc961 upstream. If CONFIG_X86_SMAP is disabled, smap_violation() tests for conditions which are incorrect (as the AC flag doesn't matter), causing spurious faults. The dynamic disabling of SMAP (nosmap on the command line) is fine because it disables X86_FEATURE_SMAP, therefore causing the static_cpu_has() to return false. Found by Fengguang Wu's test system. [ v3: move all predicates into smap_violation() ] [ v2: use IS_ENABLED() instead of #ifdef ] Reported-by: Fengguang Wu <fengguang.wu@intel.com> Link: http://lkml.kernel.org/r/20140213124550.GA30497@localhost Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-22x86, smap: Don't enable SMAP if CONFIG_X86_SMAP is disabledH. Peter Anvin
commit 03bbd596ac04fef47ce93a730b8f086d797c3021 upstream. If SMAP support is not compiled into the kernel, don't enable SMAP in CR4 -- in fact, we should clear it, because the kernel doesn't contain the proper STAC/CLAC instructions for SMAP support. Found by Fengguang Wu's test system. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Link: http://lkml.kernel.org/r/20140213124550.GA30497@localhost Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-22s390: fix kernel crash due to linkage stack instructionsMartin Schwidefsky
commit 8d7f6690cedb83456edd41c9bd583783f0703bf0 upstream. The kernel currently crashes with a low-address-protection exception if a user space process executes an instruction that tries to use the linkage stack. Set the base-ASTE origin and the subspace-ASTE origin of the dispatchable-unit-control-table to point to a dummy ASTE. Set up control register 15 to point to an empty linkage stack with no room left. A user space process with a linkage stack instruction will still crash but with a different exception which is correctly translated to a segmentation fault instead of a kernel oops. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-22s390/dump: Fix dump memory detectionMichael Holzheu
commit d7736ff5be31edaa4fe5ab62810c64529a24b149 upstream. Dumps created by kdump or zfcpdump can contain invalid memory holes when dumping z/VM systems that have memory pressure. For example: # zgetdump -i /proc/vmcore. Memory map: 0000000000000000 - 0000000000bfffff (12 MB) 0000000000e00000 - 00000000014fffff (7 MB) 000000000bd00000 - 00000000f3bfffff (3711 MB) The memory detection function find_memory_chunks() issues tprot to find valid memory chunks. In case of CMM it can happen that pages are marked as unstable via set_page_unstable() in arch_free_page(). If z/VM has released that pages, tprot returns -EFAULT and indicates a memory hole. So fix this and switch off CMM in case of kdump or zfcpdump. Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-22xen: properly account for _PAGE_NUMA during xen pte translationsMel Gorman
commit a9c8e4beeeb64c22b84c803747487857fe424b68 upstream. Steven Noonan forwarded a users report where they had a problem starting vsftpd on a Xen paravirtualized guest, with this in dmesg: BUG: Bad page map in process vsftpd pte:8000000493b88165 pmd:e9cc01067 page:ffffea00124ee200 count:0 mapcount:-1 mapping: (null) index:0x0 page flags: 0x2ffc0000000014(referenced|dirty) addr:00007f97eea74000 vm_flags:00100071 anon_vma:ffff880e98f80380 mapping: (null) index:7f97eea74 CPU: 4 PID: 587 Comm: vsftpd Not tainted 3.12.7-1-ec2 #1 Call Trace: dump_stack+0x45/0x56 print_bad_pte+0x22e/0x250 unmap_single_vma+0x583/0x890 unmap_vmas+0x65/0x90 exit_mmap+0xc5/0x170 mmput+0x65/0x100 do_exit+0x393/0x9e0 do_group_exit+0xcc/0x140 SyS_exit_group+0x14/0x20 system_call_fastpath+0x1a/0x1f Disabling lock debugging due to kernel taint BUG: Bad rss-counter state mm:ffff880e9ca60580 idx:0 val:-1 BUG: Bad rss-counter state mm:ffff880e9ca60580 idx:1 val:1 The issue could not be reproduced under an HVM instance with the same kernel, so it appears to be exclusive to paravirtual Xen guests. He bisected the problem to commit 1667918b6483 ("mm: numa: clear numa hinting information on mprotect") that was also included in 3.12-stable. The problem was related to how xen translates ptes because it was not accounting for the _PAGE_NUMA bit. This patch splits pte_present to add a pteval_present helper for use by xen so both bare metal and xen use the same code when checking if a PTE is present. [mgorman@suse.de: wrote changelog, proposed minor modifications] [akpm@linux-foundation.org: fix typo in comment] Reported-by: Steven Noonan <steven@uplinklabs.net> Tested-by: Steven Noonan <steven@uplinklabs.net> Signed-off-by: Elena Ufimtseva <ufimtseva@gmail.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20x86: mm: change tlb_flushall_shift for IvyBridgeMel Gorman
commit f98b7a772ab51b52ca4d2a14362fc0e0c8a2e0f3 upstream. There was a large performance regression that was bisected to commit 611ae8e3 ("x86/tlb: enable tlb flush range support for x86"). This patch simply changes the default balance point between a local and global flush for IvyBridge. In the interest of allowing the tests to be reproduced, this patch was tested using mmtests 0.15 with the following configurations configs/config-global-dhp__tlbflush-performance configs/config-global-dhp__scheduler-performance configs/config-global-dhp__network-performance Results are from two machines Ivybridge 4 threads: Intel(R) Core(TM) i3-3240 CPU @ 3.40GHz Ivybridge 8 threads: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz Page fault microbenchmark showed nothing interesting. Ebizzy was configured to run multiple iterations and threads. Thread counts ranged from 1 to NR_CPUS*2. For each thread count, it ran 100 iterations and each iteration lasted 10 seconds. Ivybridge 4 threads 3.13.0-rc7 3.13.0-rc7 vanilla altshift-v3 Mean 1 6395.44 ( 0.00%) 6789.09 ( 6.16%) Mean 2 7012.85 ( 0.00%) 8052.16 ( 14.82%) Mean 3 6403.04 ( 0.00%) 6973.74 ( 8.91%) Mean 4 6135.32 ( 0.00%) 6582.33 ( 7.29%) Mean 5 6095.69 ( 0.00%) 6526.68 ( 7.07%) Mean 6 6114.33 ( 0.00%) 6416.64 ( 4.94%) Mean 7 6085.10 ( 0.00%) 6448.51 ( 5.97%) Mean 8 6120.62 ( 0.00%) 6462.97 ( 5.59%) Ivybridge 8 threads 3.13.0-rc7 3.13.0-rc7 vanilla altshift-v3 Mean 1 7336.65 ( 0.00%) 7787.02 ( 6.14%) Mean 2 8218.41 ( 0.00%) 9484.13 ( 15.40%) Mean 3 7973.62 ( 0.00%) 8922.01 ( 11.89%) Mean 4 7798.33 ( 0.00%) 8567.03 ( 9.86%) Mean 5 7158.72 ( 0.00%) 8214.23 ( 14.74%) Mean 6 6852.27 ( 0.00%) 7952.45 ( 16.06%) Mean 7 6774.65 ( 0.00%) 7536.35 ( 11.24%) Mean 8 6510.50 ( 0.00%) 6894.05 ( 5.89%) Mean 12 6182.90 ( 0.00%) 6661.29 ( 7.74%) Mean 16 6100.09 ( 0.00%) 6608.69 ( 8.34%) Ebizzy hits the worst case scenario for TLB range flushing every time and it shows for these Ivybridge CPUs at least that the default choice is a poor on. The patch addresses the problem. Next was a tlbflush microbenchmark written by Alex Shi at http://marc.info/?l=linux-kernel&m=133727348217113 . It measures access costs while the TLB is being flushed. The expectation is that if there are always full TLB flushes that the benchmark would suffer and it benefits from range flushing There are 320 iterations of the test per thread count. The number of entries is randomly selected with a min of 1 and max of 512. To ensure a reasonably even spread of entries, the full range is broken up into 8 sections and a random number selected within that section. iteration 1, random number between 0-64 iteration 2, random number between 64-128 etc This is still a very weak methodology. When you do not know what are typical ranges, random is a reasonable choice but it can be easily argued that the opimisation was for smaller ranges and an even spread is not representative of any workload that matters. To improve this, we'd need to know the probability distribution of TLB flush range sizes for a set of workloads that are considered "common", build a synthetic trace and feed that into this benchmark. Even that is not perfect because it would not account for the time between flushes but there are limits of what can be reasonably done and still be doing something useful. If a representative synthetic trace is provided then this benchmark could be revisited and the shift values retuned. Ivybridge 4 threads 3.13.0-rc7 3.13.0-rc7 vanilla altshift-v3 Mean 1 10.50 ( 0.00%) 10.50 ( 0.03%) Mean 2 17.59 ( 0.00%) 17.18 ( 2.34%) Mean 3 22.98 ( 0.00%) 21.74 ( 5.41%) Mean 5 47.13 ( 0.00%) 46.23 ( 1.92%) Mean 8 43.30 ( 0.00%) 42.56 ( 1.72%) Ivybridge 8 threads 3.13.0-rc7 3.13.0-rc7 vanilla altshift-v3 Mean 1 9.45 ( 0.00%) 9.36 ( 0.93%) Mean 2 9.37 ( 0.00%) 9.70 ( -3.54%) Mean 3 9.36 ( 0.00%) 9.29 ( 0.70%) Mean 5 14.49 ( 0.00%) 15.04 ( -3.75%) Mean 8 41.08 ( 0.00%) 38.73 ( 5.71%) Mean 13 32.04 ( 0.00%) 31.24 ( 2.49%) Mean 16 40.05 ( 0.00%) 39.04 ( 2.51%) For both CPUs, average access time is reduced which is good as this is the benchmark that was used to tune the shift values in the first place albeit it is now known *how* the benchmark was used. The scheduler benchmarks were somewhat inconclusive. They showed gains and losses and makes me reconsider how stable those benchmarks really are or if something else might be interfering with the test results recently. Network benchmarks were inconclusive. Almost all results were flat except for netperf-udp tests on the 4 thread machine. These results were unstable and showed large variations between reboots. It is unknown if this is a recent problems but I've noticed before that netperf-udp results tend to vary. Based on these results, changing the default for Ivybridge seems like a logical choice. Signed-off-by: Mel Gorman <mgorman@suse.de> Tested-by: Davidlohr Bueso <davidlohr@hp.com> Reviewed-by: Alex Shi <alex.shi@linaro.org> Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-cqnadffh1tiqrshthRj3Esge@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20arm64: add DSB after icache flush in __flush_icache_all()Vinayak Kale
commit 5044bad43ee573d0b6d90e3ccb7a40c2c7d25eb4 upstream. Add DSB after icache flush to complete the cache maintenance operation. The function __flush_icache_all() is used only for user space mappings and an ISB is not required because of an exception return before executing user instructions. An exception return would behave like an ISB. Signed-off-by: Vinayak Kale <vkale@apm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20arm64: vdso: fix coarse clock handlingNathan Lynch
commit 069b918623e1510e58dacf178905a72c3baa3ae4 upstream. When __kernel_clock_gettime is called with a CLOCK_MONOTONIC_COARSE or CLOCK_REALTIME_COARSE clock id, it returns incorrectly to whatever the caller has placed in x2 ("ret x2" to return from the fast path). Fix this by saving x30/LR to x2 only in code that will call __do_get_tspec, restoring x30 afterward, and using a plain "ret" to return from the routine. Also: while the resulting tv_nsec value for CLOCK_REALTIME and CLOCK_MONOTONIC must be computed using intermediate values that are left-shifted by cs_shift (x12, set by __do_get_tspec), the results for coarse clocks should be calculated using unshifted values (xtime_coarse_nsec is in units of actual nanoseconds). The current code shifts intermediate values by x12 unconditionally, but x12 is uninitialized when servicing a coarse clock. Fix this by setting x12 to 0 once we know we are dealing with a coarse clock id. Signed-off-by: Nathan Lynch <nathan_lynch@mentor.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20arm64: Invalidate the TLB when replacing pmd entries during bootCatalin Marinas
commit a55f9929a9b257f84b6cc7b2397379cabd744a22 upstream. With the 64K page size configuration, __create_page_tables in head.S maps enough memory to get started but using 64K pages rather than 512M sections with a single pgd/pud/pmd entry pointing to a pte table. create_mapping() may override the pgd/pud/pmd table entry with a block (section) one if the RAM size is more than 512MB and aligned correctly. For the end of this block to be accessible, the old TLB entry must be invalidated. Reported-by: Mark Salter <msalter@redhat.com> Tested-by: Mark Salter <msalter@redhat.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20arm64: vdso: prevent ld from aligning PT_LOAD segments to 64kWill Deacon
commit 40507403485fcb56b83d6ddfc954e9b08305054c upstream. Whilst the text segment for our VDSO is marked as PT_LOAD in the ELF headers, it is mapped by the kernel and not actually subject to demand-paging. ld doesn't realise this, and emits a p_align field of 64k (the maximum supported page size), which conflicts with the load address picked by the kernel on 4k systems, which will be 4k aligned. This causes GDB to fail with "Failed to read a valid object file image from memory" when attempting to load the VDSO. This patch passes the -n option to ld, which prevents it from aligning PT_LOAD segments to the maximum page size. Reported-by: Kyle McMartin <kyle@redhat.com> Acked-by: Kyle McMartin <kyle@redhat.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20arm64: vdso: update wtm fields for CLOCK_MONOTONIC_COARSENathan Lynch
commit d4022a335271a48cce49df35d825897914fbffe3 upstream. Update wall-to-monotonic fields in the VDSO data page unconditionally. These are used to service CLOCK_MONOTONIC_COARSE, which is not guarded by use_syscall. Signed-off-by: Nathan Lynch <nathan_lynch@mentor.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20crypto: s390 - fix des and des3_ede ctr concurrency issueHarald Freudenberger
commit ee97dc7db4cbda33e4241c2d85b42d1835bc8a35 upstream. In s390 des and 3des ctr mode there is one preallocated page used to speed up the en/decryption. This page is not protected against concurrent usage and thus there is a potential of data corruption with multiple threads. The fix introduces locking/unlocking the ctr page and a slower fallback solution at concurrency situations. Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20crypto: s390 - fix des and des3_ede cbc concurrency issueHarald Freudenberger
commit adc3fcf1552b6e406d172fd9690bbd1395053d13 upstream. In s390 des and des3_ede cbc mode the iv value is not protected against concurrency access and modifications from another running en/decrypt operation which is using the very same tfm struct instance. This fix copies the iv to the local stack before the crypto operation and stores the value back when done. Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20crypto: s390 - fix concurrency issue in aes-ctr modeHarald Freudenberger
commit 0519e9ad89e5cd6e6b08398f57c6a71d9580564c upstream. The aes-ctr mode uses one preallocated page without any concurrency protection. When multiple threads run aes-ctr encryption or decryption this can lead to data corruption. The patch introduces locking for the page and a fallback solution with slower en/decryption performance in concurrency situations. Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-13xtensa: xtfpga: fix definitions of platform devicesMax Filippov
commit a558d99263936b8a67d4eff8918745a77bfd8c31 upstream. Remove __initdata attribute, as the devices may be used after init sections are freed. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-13tile: remove compat_sys_lookup_dcookie declaration to fix compile errorHeiko Carstens
commit 5a5e75f4714a592f31e57f248b8f5c866f278b8d upstream. With commit d8d14bd09cdd ("fs/compat: fix lookup_dcookie() parameter handling") I changed the type of the len parameter of the lookup_dcookie() syscall. However I missed that there was still a stale declaration in arch/tile/.. which now causes a compile error on tile: In file included from fs/dcookies.c:28:0: include/linux/compat.h:425:17: error: conflicting types for 'compat_sys_lookup_dcookie' fs/dcookies.c:207:1: error: conflicting types for 'compat_sys_lookup_dcookie' Simply remove the declaration in the tile architecture, which is only a leftover from before the different compat lookup_dcookie() versions have been merged. The correct declaration is now in include/linux/compat.h The build error was reported by Fenguang's build bot. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-13ARM: mvebu: Fix kernel hang in mvebu_soc_id_init() when of_iomap failedGregory CLEMENT
commit dc4910d9e93f8cc56b190dd8fc9e789135978216 upstream. When pci_base is accessed whereas it has not been properly mapped by of_iomap() the kernel hang. The check of this pointer made an improper use of IS_ERR() instead of comparing to NULL. This patch fix this issue. Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Reported-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Fixes: 930ab3d403ae (i2c: mv64xxx: Add I2C Transaction Generator support) Signed-off-by: Jason Cooper <jason@lakedaemon.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-13ARM: orion: provide C-style interrupt handler for MULTI_IRQ_HANDLERSebastian Hesselbarth
commit f28d7de6bd4d41774744e011141945affa127da4 upstream. DT-enabled Marvell Kirkwood and Dove SoCs make use of an irqchip driver. As expected for irqchip drivers, it uses a C-style interrupt handler and therefore selects MULTI_IRQ_HANDLER. Now, compiling a kernel with both non-DT and DT support enabled, selecting MULTI_IRQ_HANDLER will break ASM irq handler used by non-DT boards. Therefore, we provide a C-style irq handler even for non-DT boards, if MULTI_IRQ_HANDLER is set. By installing the C-style irq handler in orion_irq_init this is transparent to all non-DT board files. While the regression report was filed on Marvell Kirkwood, also Marvell Dove non-DT boards are affected and fixed by this patch. Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com> Tested-by: Ian Campbell <ijc@hellion.org.uk> Reported-by: Ian Campbell <ijc@hellion.org.uk> Fixes: 2326f04321a9 ("ARM: kirkwood: convert to DT irqchip and clocksource") Fixes: f07d73e33d0e ("ARM: dove: convert to DT irqchip and clocksource") Acked-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jason Cooper <jason@lakedaemon.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-13mm: don't lose the SOFT_DIRTY flag on mprotectAndrey Vagin
commit 24f91eba18bbfdb27e71a1aae5b3a61b67fcd091 upstream. The SOFT_DIRTY bit shows that the content of memory was changed after a defined point in the past. mprotect() doesn't change the content of memory, so it must not change the SOFT_DIRTY bit. This bug causes a malfunction: on the first iteration all pages are dumped. On other iterations only pages with the SOFT_DIRTY bit are dumped. So if the SOFT_DIRTY bit is cleared from a page by mistake, the page is not dumped and its content will be restored incorrectly. This patch does nothing with _PAGE_SWP_SOFT_DIRTY, becase pte_modify() is called only for present pages. Fixes commit 0f8975ec4db2 ("mm: soft-dirty bits for user memory changes tracking"). Signed-off-by: Andrey Vagin <avagin@openvz.org> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Borislav Petkov <bp@suse.de> Cc: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-13xen/pvhvm: If xen_platform_pci=0 is set don't blow up (v4).Konrad Rzeszutek Wilk
commit 51c71a3bbaca868043cc45b3ad3786dd48a90235 upstream. The user has the option of disabling the platform driver: 00:02.0 Unassigned class [ff80]: XenSource, Inc. Xen Platform Device (rev 01) which is used to unplug the emulated drivers (IDE, Realtek 8169, etc) and allow the PV drivers to take over. If the user wishes to disable that they can set: xen_platform_pci=0 (in the guest config file) or xen_emul_unplug=never (on the Linux command line) except it does not work properly. The PV drivers still try to load and since the Xen platform driver is not run - and it has not initialized the grant tables, most of the PV drivers stumble upon: input: Xen Virtual Keyboard as /devices/virtual/input/input5 input: Xen Virtual Pointer as /devices/virtual/input/input6M ------------[ cut here ]------------ kernel BUG at /home/konrad/ssd/konrad/linux/drivers/xen/grant-table.c:1206! invalid opcode: 0000 [#1] SMP Modules linked in: xen_kbdfront(+) xenfs xen_privcmd CPU: 6 PID: 1389 Comm: modprobe Not tainted 3.13.0-rc1upstream-00021-ga6c892b-dirty #1 Hardware name: Xen HVM domU, BIOS 4.4-unstable 11/26/2013 RIP: 0010:[<ffffffff813ddc40>] [<ffffffff813ddc40>] get_free_entries+0x2e0/0x300 Call Trace: [<ffffffff8150d9a3>] ? evdev_connect+0x1e3/0x240 [<ffffffff813ddd0e>] gnttab_grant_foreign_access+0x2e/0x70 [<ffffffffa0010081>] xenkbd_connect_backend+0x41/0x290 [xen_kbdfront] [<ffffffffa0010a12>] xenkbd_probe+0x2f2/0x324 [xen_kbdfront] [<ffffffff813e5757>] xenbus_dev_probe+0x77/0x130 [<ffffffff813e7217>] xenbus_frontend_dev_probe+0x47/0x50 [<ffffffff8145e9a9>] driver_probe_device+0x89/0x230 [<ffffffff8145ebeb>] __driver_attach+0x9b/0xa0 [<ffffffff8145eb50>] ? driver_probe_device+0x230/0x230 [<ffffffff8145eb50>] ? driver_probe_device+0x230/0x230 [<ffffffff8145cf1c>] bus_for_each_dev+0x8c/0xb0 [<ffffffff8145e7d9>] driver_attach+0x19/0x20 [<ffffffff8145e260>] bus_add_driver+0x1a0/0x220 [<ffffffff8145f1ff>] driver_register+0x5f/0xf0 [<ffffffff813e55c5>] xenbus_register_driver_common+0x15/0x20 [<ffffffff813e76b3>] xenbus_register_frontend+0x23/0x40 [<ffffffffa0015000>] ? 0xffffffffa0014fff [<ffffffffa001502b>] xenkbd_init+0x2b/0x1000 [xen_kbdfront] [<ffffffff81002049>] do_one_initcall+0x49/0x170 .. snip.. which is hardly nice. This patch fixes this by having each PV driver check for: - if running in PV, then it is fine to execute (as that is their native environment). - if running in HVM, check if user wanted 'xen_emul_unplug=never', in which case bail out and don't load any PV drivers. - if running in HVM, and if PCI device 5853:0001 (xen_platform_pci) does not exist, then bail out and not load PV drivers. - (v2) if running in HVM, and if the user wanted 'xen_emul_unplug=ide-disks', then bail out for all PV devices _except_ the block one. Ditto for the network one ('nics'). - (v2) if running in HVM, and if the user wanted 'xen_emul_unplug=unnecessary' then load block PV driver, and also setup the legacy IDE paths. In (v3) make it actually load PV drivers. Reported-by: Sander Eikelenboom <linux@eikelenboom.it Reported-by: Anthony PERARD <anthony.perard@citrix.com> Reported-and-Tested-by: Fabio Fantoni <fabio.fantoni@m2r.biz> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [v2: Add extra logic to handle the myrid ways 'xen_emul_unplug' can be used per Ian and Stefano suggestion] [v3: Make the unnecessary case work properly] [v4: s/disks/ide-disks/ spotted by Fabio] Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> [for PCI parts] Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-13arch/sh/kernel/kgdb.c: add missing #include <linux/sched.h>Wanlong Gao
commit 53a52f17d96c8d47c79a7dafa81426317e89c7c1 upstream. arch/sh/kernel/kgdb.c: In function 'sleeping_thread_to_gdb_regs': arch/sh/kernel/kgdb.c:225:32: error: implicit declaration of function 'task_stack_page' [-Werror=implicit-function-declaration] arch/sh/kernel/kgdb.c:242:23: error: dereferencing pointer to incomplete type arch/sh/kernel/kgdb.c:243:22: error: dereferencing pointer to incomplete type arch/sh/kernel/kgdb.c: In function 'singlestep_trap_handler': arch/sh/kernel/kgdb.c:310:27: error: 'SIGTRAP' undeclared (first use in this function) arch/sh/kernel/kgdb.c:310:27: note: each undeclared identifier is reported only once for each function it appears in This was introduced by commit 16559ae48c76 ("kgdb: remove #include <linux/serial_8250.h> from kgdb.h"). [geert@linux-m68k.org: reworded and reformatted] Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com> Signed-off-by: Geert Uytterhoeven <geert+renesas@linux-m68k.org> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06x86, cpu, amd: Add workaround for family 16h, erratum 793Borislav Petkov
commit 3b56496865f9f7d9bcb2f93b44c63f274f08e3b6 upstream. This adds the workaround for erratum 793 as a precaution in case not every BIOS implements it. This addresses CVE-2013-6885. Erratum text: [Revision Guide for AMD Family 16h Models 00h-0Fh Processors, document 51810 Rev. 3.04 November 2013] 793 Specific Combination of Writes to Write Combined Memory Types and Locked Instructions May Cause Core Hang Description Under a highly specific and detailed set of internal timing conditions, a locked instruction may trigger a timing sequence whereby the write to a write combined memory type is not flushed, causing the locked instruction to stall indefinitely. Potential Effect on System Processor core hang. Suggested Workaround BIOS should set MSR C001_1020[15] = 1b. Fix Planned No fix planned [ hpa: updated description, fixed typo in MSR name ] Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/20140114230711.GS29865@pd.tnic Tested-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06powerpc: Make sure "cache" directory is removed when offlining cpuPaul Mackerras
commit 91b973f90c1220d71923e7efe1e61f5329806380 upstream. The code in remove_cache_dir() is supposed to remove the "cache" subdirectory from the sysfs directory for a CPU when that CPU is being offlined. It tries to do this by calling kobject_put() on the kobject for the subdirectory. However, the subdirectory only gets removed once the last reference goes away, and the reference being put here may well not be the last reference. That means that the "cache" subdirectory may still exist when the offlining operation has finished. If the same CPU subsequently gets onlined, the code tries to add a new "cache" subdirectory. If the old subdirectory has not yet been removed, we get a WARN_ON in the sysfs code, with stack trace, and an error message printed on the console. Further, we ultimately end up with an online cpu with no "cache" subdirectory. This fixes it by doing an explicit kobject_del() at the point where we want the subdirectory to go away. kobject_del() removes the sysfs directory even though the object still exists in memory. The object will get freed at some point in the future. A subsequent onlining operation can create a new sysfs directory, even if the old object still exists in memory, without causing any problems. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06powerpc: Fix the setup of CPU-to-Node mappings during CPU onlineSrivatsa S. Bhat
commit d4edc5b6c480a0917e61d93d55531d7efa6230be upstream. On POWER platforms, the hypervisor can notify the guest kernel about dynamic changes in the cpu-numa associativity (VPHN topology update). Hence the cpu-to-node mappings that we got from the firmware during boot, may no longer be valid after such updates. This is handled using the arch_update_cpu_topology() hook in the scheduler, and the sched-domains are rebuilt according to the new mappings. But unfortunately, at the moment, CPU hotplug ignores these updated mappings and instead queries the firmware for the cpu-to-numa relationships and uses them during CPU online. So the kernel can end up assigning wrong NUMA nodes to CPUs during subsequent CPU hotplug online operations (after booting). Further, a particularly problematic scenario can result from this bug: On POWER platforms, the SMT mode can be switched between 1, 2, 4 (and even 8) threads per core. The switch to Single-Threaded (ST) mode is performed by offlining all except the first CPU thread in each core. Switching back to SMT mode involves onlining those other threads back, in each core. Now consider this scenario: 1. During boot, the kernel gets the cpu-to-node mappings from the firmware and assigns the CPUs to NUMA nodes appropriately, during CPU online. 2. Later on, the hypervisor updates the cpu-to-node mappings dynamically and communicates this update to the kernel. The kernel in turn updates its cpu-to-node associations and rebuilds its sched domains. Everything is fine so far. 3. Now, the user switches the machine from SMT to ST mode (say, by running ppc64_cpu --smt=1). This involves offlining all except 1 thread in each core. 4. The user then tries to switch back from ST to SMT mode (say, by running ppc64_cpu --smt=4), and this involves onlining those threads back. Since CPU hotplug ignores the new mappings, it queries the firmware and tries to associate the newly onlined sibling threads to the old NUMA nodes. This results in sibling threads within the same core getting associated with different NUMA nodes, which is incorrect. The scheduler's build-sched-domains code gets thoroughly confused with this and enters an infinite loop and causes soft-lockups, as explained in detail in commit 3be7db6ab (powerpc: VPHN topology change updates all siblings). So to fix this, use the numa_cpu_lookup_table to remember the updated cpu-to-node mappings, and use them during CPU hotplug online operations. Further, we also need to ensure that all threads in a core are assigned to a common NUMA node, irrespective of whether all those threads were online during the topology update. To achieve this, we take care not to use cpu_sibling_mask() since it is not hotplug invariant. Instead, we use cpu_first_sibling_thread() and set up the mappings manually using the 'threads_per_core' value for that particular platform. This helps us ensure that we don't hit this bug with any combination of CPU hotplug and SMT mode switching. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06KVM: PPC: e500: Fix bad address type in deliver_tlb_misss()Mihai Caraman
commit 70713fe315ed14cd1bb07d1a7f33e973d136ae3d upstream. Use gva_t instead of unsigned int for eaddr in deliver_tlb_miss(). Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06KVM: PPC: Book3S HV: use xics_wake_cpu only when definedAndreas Schwab
commit 48eaef0518a565d3852e301c860e1af6a6db5a84 upstream. Signed-off-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06parisc: fix cache-flushingHelge Deller
commit 57737c49dd72c96cfbcd4f66559f3ffc399aeb4f upstream. This commit: f8dae00684d678afa13041ef170cecfd1297ed40: parisc: Ensure full cache coherency for kmap/kunmap caused negative caching side-effects, e.g. hanging processes with expect and too many inequivalent alias messages from flush_dcache_page() on Debian 5 systems. This patch now partly reverts it and has been in production use on our debian buildd makeservers since a week without any major problems. Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06alpha: fix broken network checksumMikulas Patocka
commit 0ef38d70d4118b2ce1a538d14357be5ff9dc2bbd upstream. The patch 3ddc5b46a8e90f3c9251338b60191d0a804b0d92 breaks networking on alpha (there is a follow-up fix 5cfe8f1ba5eebe6f4b6e5858cdb1a5be4f3272a6, but networking is still broken even with the second patch). The patch 3ddc5b46a8e90f3c9251338b60191d0a804b0d92 makes csum_partial_copy_from_user check the pointer with access_ok. However, csum_partial_copy_from_user is called also from csum_partial_copy_nocheck and csum_partial_copy_nocheck is called on kernel pointers and it is supposed not to check pointer validity. This bug results in ssh session hangs if the system is loaded and bulk data are printed to ssh terminal. This patch fixes csum_partial_copy_nocheck to call set_fs(KERNEL_DS), so that access_ok in csum_partial_copy_from_user accepts kernel-space addresses. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06s390/bpf,jit: fix 32 bit divisions, use unsigned divide instructionsHeiko Carstens
[ Upstream commit 3af57f78c38131b7a66e2b01e06fdacae01992a3 ] The s390 bpf jit compiler emits the signed divide instructions "dr" and "d" for unsigned divisions. This can cause problems: the dividend will be zero extended to a 64 bit value and the divisor is the 32 bit signed value as specified A or X accumulator, even though A and X are supposed to be treated as unsigned values. The divide instrunctions will generate an exception if the result cannot be expressed with a 32 bit signed value. This is the case if e.g. the dividend is 0xffffffff and the divisor either 1 or also 0xffffffff (signed: -1). To avoid all these issues simply use unsigned divide instructions. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06bpf: do not use reciprocal divideEric Dumazet
[ Upstream commit aee636c4809fa54848ff07a899b326eb1f9987a2 ] At first Jakub Zawadzki noticed that some divisions by reciprocal_divide were not correct. (off by one in some cases) http://www.wireshark.org/~darkjames/reciprocal-buggy.c He could also show this with BPF: http://www.wireshark.org/~darkjames/set-and-dump-filter-k-bug.c The reciprocal divide in linux kernel is not generic enough, lets remove its use in BPF, as it is not worth the pain with current cpus. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Jakub Zawadzki <darkjames-ws@darkjames.pl> Cc: Mircea Gherzan <mgherzan@gmail.com> Cc: Daniel Borkmann <dxchgb@gmail.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Matt Evans <matt@ozlabs.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06ARM: mvebu: Add quirk for i2c for the OpenBlocks AX3-4 boardGregory CLEMENT
commit 85e618a1be2b2092318178d1d66bdad49cbbeeeb upstream. The first variants of Armada XP SoCs (A0 stepping) have issues related to the i2c controller which prevent to use the offload mechanism and lead to a kernel hang during boot. This commit add quirk in the mvebu platform code to check the SoC version and then update the compatible string for the i2c controller according to the revision of the SoC. Currently only some OpenBlocks AX3-4 boards are known to use an A0 revision so the check is done only for these boards. Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Fixes: 930ab3d403ae (i2c: mv64xxx: Add I2C Transaction Generator support) Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jason Cooper <jason@lakedaemon.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06ARM: mvebu: Add support to get the ID and the revision of a SoCGregory CLEMENT
commit af8d1c63afcbf36eea06789c92e22d4af118d2fb upstream. All the mvebu SoCs have information related to their variant and revision that can be read from the PCI control register. This patch adds support for Armada XP and Armada 370. This reading of the revision and the ID are done before the PCI initialization to avoid any conflicts. Once these data are retrieved, the resources are freed to let the PCI subsystem use it. Fixes: 930ab3d403ae (i2c: mv64xxx: Add I2C Transaction Generator support) Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jason Cooper <jason@lakedaemon.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06ARM: mvebu: update the SATA compatible string for Armada 370/XPSimon Guinot
commit a96cc303e42ad7830dde929aad0046e448a05505 upstream. This patch updates the Armada 370/XP SATA node with the new compatible string "marvell,armada-370-sata". Signed-off-by: Simon Guinot <simon.guinot@sequanux.org> Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: Jason Cooper <jason@lakedaemon.net> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Gregory Clement <gregory.clement@free-electrons.com> Cc: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com> Cc: Lior Amsalem <alior@marvell.com> Acked-by: Jason Cooper <jason@lakedaemon.net> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>