summaryrefslogtreecommitdiff
path: root/arch/x86/kernel
AgeCommit message (Collapse)Author
2013-09-06Merge tag 'soc-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC platform changes from Olof Johansson: "This branch contains mostly additions and changes to platform enablement and SoC-level drivers. Since there's sometimes a dependency on device-tree changes, there's also a fair amount of those in this branch. Pieces worth mentioning are: - Mbus driver for Marvell platforms, allowing kernel configuration and resource allocation of on-chip peripherals. - Enablement of the mbus infrastructure from Marvell PCI-e drivers. - Preparation of MSI support for Marvell platforms. - Addition of new PCI-e host controller driver for Tegra platforms - Some churn caused by sharing of macro names between i.MX 6Q and 6DL platforms in the device tree sources and header files. - Various suspend/PM updates for Tegra, including LP1 support. - Versatile Express support for MCPM, part of big little support. - Allwinner platform support for A20 and A31 SoCs (dual and quad Cortex-A7) - OMAP2+ support for DRA7, a new Cortex-A15-based SoC. The code that touches other architectures are patches moving MSI arch-specific functions over to weak symbols and removal of ARCH_SUPPORTS_MSI, acked by PCI maintainers" * tag 'soc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (266 commits) tegra-cpuidle: provide stub when !CONFIG_CPU_IDLE PCI: tegra: replace devm_request_and_ioremap by devm_ioremap_resource ARM: tegra: Drop ARCH_SUPPORTS_MSI and sort list ARM: dts: vf610-twr: enable i2c0 device ARM: dts: i.MX51: Add one more I2C2 pinmux entry ARM: dts: i.MX51: Move pins configuration under "iomuxc" label ARM: dtsi: imx6qdl-sabresd: Add USB OTG vbus pin to pinctrl_hog ARM: dtsi: imx6qdl-sabresd: Add USB host 1 VBUS regulator ARM: dts: imx27-phytec-phycore-som: Enable AUDMUX ARM: dts: i.MX27: Disable AUDMUX in the template ARM: dts: wandboard: Add support for SDIO bcm4329 ARM: i.MX5 clocks: Remove optional clock setup (CKIH1) from i.MX51 template ARM: dts: imx53-qsb: Make USBH1 functional ARM i.MX6Q: dts: Enable I2C1 with EEPROM and PMIC on Phytec phyFLEX-i.MX6 Ouad module ARM i.MX6Q: dts: Enable SPI NOR flash on Phytec phyFLEX-i.MX6 Ouad module ARM: dts: imx6qdl-sabresd: Add touchscreen support ARM: imx: add ocram clock for imx53 ARM: dts: imx: ocram size is different between imx6q and imx6dl ARM: dts: imx27-phytec-phycore-som: Fix regulator settings ARM: dts: i.MX27: Remove clock name from CPU node ...
2013-09-05Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Fix for the annoying paravirt.o build warning under allmodconfig, and a MAINTAINERS file update" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, doc: Add an entry in MAINTAINERS for arch/x86/kernel/cpu/vmware.c x86, paravirt: Remove duplicate definition for DEF_NATIVE
2013-09-05Merge branch 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Gleb Natapov: "The highlights of the release are nested EPT and pv-ticketlocks support (hypervisor part, guest part, which is most of the code, goes through tip tree). Apart of that there are many fixes for all arches" Fix up semantic conflicts as discussed in the pull request thread.. * 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (88 commits) ARM: KVM: Add newlines to panic strings ARM: KVM: Work around older compiler bug ARM: KVM: Simplify tracepoint text ARM: KVM: Fix kvm_set_pte assignment ARM: KVM: vgic: Bump VGIC_NR_IRQS to 256 ARM: KVM: Bugfix: vgic_bytemap_get_reg per cpu regs ARM: KVM: vgic: fix GICD_ICFGRn access ARM: KVM: vgic: simplify vgic_get_target_reg KVM: MMU: remove unused parameter KVM: PPC: Book3S PR: Rework kvmppc_mmu_book3s_64_xlate() KVM: PPC: Book3S PR: Make instruction fetch fallback work for system calls KVM: PPC: Book3S PR: Don't corrupt guest state when kernel uses VMX KVM: x86: update masterclock when kvmclock_offset is calculated (v2) KVM: PPC: Book3S: Fix compile error in XICS emulation KVM: PPC: Book3S PR: return appropriate error when allocation fails arch: powerpc: kvm: add signed type cast for comparation KVM: x86: add comments where MMIO does not return to the emulator KVM: vmx: count exits to userspace during invalid guest emulation KVM: rename __kvm_io_bus_sort_cmp to kvm_io_bus_cmp kvm: optimize away THP checks in kvm_is_mmio_pfn() ...
2013-09-04Merge branch 'x86-spinlocks-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 spinlock changes from Ingo Molnar: "The biggest change here are paravirtualized ticket spinlocks (PV spinlocks), which bring a nice speedup on various benchmarks. The KVM host side will come to you via the KVM tree" * 'x86-spinlocks-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/kvm/guest: Fix sparse warning: "symbol 'klock_waiting' was not declared as static" kvm: Paravirtual ticketlocks support for linux guests running on KVM hypervisor kvm guest: Add configuration support to enable debug information for KVM Guests kvm uapi: Add KICK_CPU and PV_UNHALT definition to uapi xen, pvticketlock: Allow interrupts to be enabled while blocking x86, ticketlock: Add slowpath logic jump_label: Split jumplabel ratelimit x86, pvticketlock: When paravirtualizing ticket locks, increment by 2 x86, pvticketlock: Use callee-save for lock_spinning xen, pvticketlocks: Add xen_nopvspin parameter to disable xen pv ticketlocks xen, pvticketlock: Xen implementation for PV ticket locks xen: Defer spinlock setup until boot CPU setup x86, ticketlock: Collapse a layer of functions x86, ticketlock: Don't inline _spin_unlock when using paravirt spinlocks x86, spinlock: Replace pv spinlocks with pv ticketlocks
2013-09-04Merge branch 'x86-smap-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 SMAP fixes from Ingo Molnar: "Fixes for Intel SMAP support, to fix SIGSEGVs during bootup" * 'x86-smap-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Introduce [compat_]save_altstack_ex() to unbreak x86 SMAP x86, smap: Handle csum_partial_copy_*_user()
2013-09-04Merge branch 'x86-ras-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RAS changes from Ingo Molnar: "[ The reason for drivers/ updates is that Boris asked for the drivers/edac/ changes to go via x86/ras in this cycle ] Main changes: - AMD CPUs: . Add ECC event decoding support for new F15h models . Various erratum fixes . Fix single-channel on dual-channel-controllers bug. - Intel CPUs: . UC uncorrectable memory error parsing fix . Add support for CMC (Corrected Machine Check) 'FF' (Firmware First) flag in the APEI HEST - Various cleanups and fixes" * 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: amd64_edac: Fix incorrect wraparounds amd64_edac: Correct erratum 505 range cpc925_edac: Use proper array termination x86/mce, acpi/apei: Only disable banks listed in HEST if mce is configured amd64_edac: Get rid of boot_cpu_data accesses amd64_edac: Add ECC decoding support for newer F15h models x86, amd_nb: Clarify F15h, model 30h GART and L3 support pci_ids: Add PCI device ID functions 3 and 4 for newer F15h models. x38_edac: Make a local function static i3200_edac: Make a local function static x86/mce: Pay no attention to 'F' bit in MCACOD when parsing 'UC' errors APEI/ERST: Fix error message formatting amd64_edac: Fix single-channel setups EDAC: Replace strict_strtol() with kstrtol() mce: acpi/apei: Soft-offline a page on firmware GHES notification mce: acpi/apei: Add a boot option to disable ff mode for corrected errors mce: acpi/apei: Honour Firmware First for MCA banks listed in APEI HEST CMC
2013-09-04Merge branch 'x86-platform-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 platform documentation fix from Ingo Molnar. * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/acpi: Correct out-of-date comment of __acpi_map_table()
2013-09-04Merge branch 'x86-paravirt-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 paravirt changes from Ingo Molnar: "Hypervisor signature detection cleanup and fixes - the goal is to make KVM guests run better on MS/Hyperv and to generalize and factor out the code a bit" * 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Correctly detect hypervisor x86, kvm: Switch to use hypervisor_cpuid_base() xen: Switch to use hypervisor_cpuid_base() x86: Introduce hypervisor_cpuid_base()
2013-09-04x86, paravirt: Remove duplicate definition for DEF_NATIVEH. Peter Anvin
DEF_NATIVE() is defined in paravirt_types.h, remove duplicate definition in paravirt.c Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Andi Kleen <ak@linux.kernel.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Link: http://lkml.kernel.org/r/CA%2B55aFxVv==DC0JdS87V%2BcPr-twN%2BTujYg5XmgHOjJOAkZ4xwQ@mail.gmail.com
2013-09-04Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm changes from Ingo Molnar: "Misc smaller fixes: - a parse_setup_data() boot crash fix - a memblock and an __early_ioremap cleanup - turn the always-on CONFIG_ARCH_MEMORY_PROBE=y into a configurable option and turn it off - it's an unrobust debug facility, it shouldn't be enabled by default" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: avoid remapping data in parse_setup_data() x86: Use memblock_set_current_limit() to set limit for memblock. mm: Remove unused variable idx0 in __early_ioremap() mm/hotplug, x86: Disable ARCH_MEMORY_PROBE by default
2013-09-04Merge branch 'x86-fb-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fb changes from Ingo Molnar: "This tree includes preparatory patches for SimpleDRM driver support, by David Herrmann. They clean up x86 framebuffer support by creating simplefb devices wherever possible. More background can be found at http://lwn.net/Articles/558104/" * 'x86-fb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: fbdev: fbcon: select VT_HW_CONSOLE_BINDING fbdev: efifb: bind to efi-framebuffer fbdev: vesafb: bind to platform-framebuffer device fbdev: simplefb: add common x86 RGB formats x86: sysfb: move EFI quirks from efifb to sysfb x86: provide platform-devices for boot-framebuffers fbdev: simplefb: mark as fw and allocate apertures fbdev: simplefb: add init through platform_data
2013-09-04Merge branch 'x86-cpu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu feature fixes from Ingo Molnar: "Two small cpufeature support updates" * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Fix override new_cpu_data.x86 with 486 x86, cpufeature: Use new CC_HAVE_ASM_GOTO
2013-09-04Merge branch 'x86-asmlinkage-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/asmlinkage changes from Ingo Molnar: "As a preparation for Andi Kleen's LTO patchset (link time optimizations using GCC's -flto which build time optimization has steadily increased in quality over the past few years and might eventually be usable for the kernel too) this tree includes a handful of preparatory patches that make function calling convention annotations consistent again: - Mark every function without arguments (or 64bit only) that is used by assembly code with asmlinkage() - Mark every function with parameters or variables that is used by assembly code as __visible. For the vanilla kernel this has documentation, consistency and debuggability advantages, for the time being" * 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/asmlinkage: Fix warning in xen asmlinkage change x86, asmlinkage, vdso: Mark vdso variables __visible x86, asmlinkage, power: Make various symbols used by the suspend asm code visible x86, asmlinkage: Make dump_stack visible x86, asmlinkage: Make 64bit checksum functions visible x86, asmlinkage, paravirt: Add __visible/asmlinkage to xen paravirt ops x86, asmlinkage, apm: Make APM data structure used from assembler visible x86, asmlinkage: Make syscall tables visible x86, asmlinkage: Make several variables used from assembler/linker script visible x86, asmlinkage: Make kprobes code visible and fix assembler code x86, asmlinkage: Make various syscalls asmlinkage x86, asmlinkage: Make 32bit/64bit __switch_to visible x86, asmlinkage: Make _*_start_kernel visible x86, asmlinkage: Make all interrupt handlers asmlinkage / __visible x86, asmlinkage: Change dotraplinkage into __visible on 32bit x86: Fix sys_call_table type in asm/syscall.h
2013-09-04Merge branch 'x86-apic-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/apic changes from Ingo Molnar: "Smaller fixes" * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ioapic: Check attr against the previous setting when programmed more than once x86/ioapic/kcrash: Prevent crash_kexec() from deadlocking on ioapic_lock x86/acpi: Fix incorrect sanity check in acpi_register_lapic()
2013-09-04Merge branches 'perf-urgent-for-linus' and 'perf-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf changes from Ingo Molnar: "As a first remark I'd like to point out that the obsolete '-f' (--force) option, which has not done anything for several releases, has been removed from 'perf record' and related utilities. Everyone please update muscle memory accordingly! :-) Main changes on the perf kernel side: - Performance optimizations: . for trace events, by Steve Rostedt. . for time values, by Peter Zijlstra - New hardware support: . for Intel Silvermont (22nm Atom) CPUs, by Zheng Yan . for Intel SNB-EP uncore PMUs, by Zheng Yan - Enhanced hardware support: . for Intel uncore PMUs: add filter support for QPI boxes, by Zheng Yan - Core perf events code enhancements and fixes: . for full-nohz feature handling, by Frederic Weisbecker . for group events, by Jiri Olsa . for call chains, by Frederic Weisbecker . for event stream parsing, by Adrian Hunter - New ABI details: . Add attr->mmap2 attribute, by Stephane Eranian . Add PERF_EVENT_IOC_ID ioctl to return event ID, by Jiri Olsa . Export u64 time_zero on the mmap header page to allow TSC calculation, by Adrian Hunter . Add dummy software event, by Adrian Hunter. . Add a new PERF_SAMPLE_IDENTIFIER to make samples always parseable, by Adrian Hunter. . Make Power7 events available via sysfs, by Runzhen Wang. - Code cleanups and refactorings: . for nohz-full, by Frederic Weisbecker . for group events, by Jiri Olsa - Documentation updates: . for perf_event_type, by Peter Zijlstra Main changes on the perf tooling side (some of these tooling changes utilize the above kernel side changes): - Lots of 'perf trace' enhancements: . Make 'perf trace' command line arguments consistent with 'perf record', by David Ahern. . Allow specifying syscalls a la strace, by Arnaldo Carvalho de Melo. . Add --verbose and -o/--output options, by Arnaldo Carvalho de Melo. . Support ! in -e expressions, to filter a list of syscalls, by Arnaldo Carvalho de Melo. . Arg formatting improvements to allow masking arguments in syscalls such as futex and open, where the some arguments are ignored and thus should not be printed depending on other args, by Arnaldo Carvalho de Melo. . Beautify futex open, openat, open_by_handle_at, lseek and futex syscalls, by Arnaldo Carvalho de Melo. . Add option to analyze events in a file versus live, so that one can do: [root@zoo ~]# perf record -a -e raw_syscalls:* sleep 1 [ perf record: Woken up 0 times to write data ] [ perf record: Captured and wrote 25.150 MB perf.data (~1098836 samples) ] [root@zoo ~]# perf trace -i perf.data -e futex --duration 1 17.799 ( 1.020 ms): 7127 futex(uaddr: 0x7fff3f6c6674, op: 393, val: 1, utime: 0x7fff3f6c6470, ua 113.344 (95.429 ms): 7127 futex(uaddr: 0x7fff3f6c6674, op: 393, val: 1, utime: 0x7fff3f6c6470, uaddr2: 0x7fff3f6c6648, val3: 4294967 133.778 ( 1.042 ms): 18004 futex(uaddr: 0x7fff3f6c6674, op: 393, val: 1, utime: 0x7fff3f6c6470, uaddr2: 0x7fff3f6c6648, val3: 429496 [root@zoo ~]# By David Ahern. . Honor target pid / tid options when analyzing a file, by David Ahern. . Introduce better formatting of syscall arguments, including so far beautifiers for mmap, madvise, syscall return values, by Arnaldo Carvalho de Melo. . Handle HUGEPAGE defines in the mmap beautifier, by David Ahern. - 'perf report/top' enhancements: . Do annotation using /proc/kcore and /proc/kallsyms when available, removing the forced need for a vmlinux file kernel assembly annotation. This also improves this use case because vmlinux has just the initial kernel image, not what is actually in use after various code patchings by things like alternatives. By Adrian Hunter. . Add --ignore-callees=<regex> option to collapse undesired parts of call graphs, by Greg Price. . Simplify symbol filtering by doing it at machine class level, by Adrian Hunter. . Add support for callchains in the gtk UI, by Namhyung Kim. . Add --objdump option to 'perf top', by Sukadev Bhattiprolu. - 'perf kvm' enhancements: . Add option to print only events that exceed a specified time duration, by David Ahern. . Improve stack trace printing, by David Ahern. . Update documentation of the live command, by David Ahern . Add perf kvm stat live mode that combines aspects of 'perf kvm stat' record and report, by David Ahern. . Add option to analyze specific VM in perf kvm stat report, by David Ahern. . Do not require /lib/modules/* on a guest, by Jason Wessel. - 'perf script' enhancements: . Fix symbol offset computation for some dsos, by David Ahern. . Fix named threads support, by David Ahern. . Don't install scripting files files when perl/python support is disabled, by Arnaldo Carvalho de Melo. - 'perf test' enhancements: . Add various improvements and fixes to the "vmlinux matches kallsyms" 'perf test' entry, related to the /proc/kcore annotation feature. By Adrian Hunter. . Add sample parsing test, by Adrian Hunter. . Add test for reading object code, by Adrian Hunter. . Add attr record group sampling test, by Jiri Olsa. . Misc testing infrastructure improvements and other details, by Jiri Olsa. - 'perf list' enhancements: . Skip unsupported hardware events, by Namhyung Kim. . List pmu events, by Andi Kleen. - 'perf diff' enhancements: . Add support for more than two files comparison, by Jiri Olsa. - 'perf sched' enhancements: . Various improvements, including removing reliance on some scheduler tracepoints that provide the same information as the PERF_RECORD_{FORK,EXIT} events. By David Ahern. . Remove odd build stall by moving a large struct initialization from a local variable to a global one, by Namhyung Kim. - 'perf stat' enhancements: . Add --initial-delay option to skip measuring for a defined startup phase, by Andi Kleen. - Generic perf tooling infrastructure/plumbing changes: . Tidy up sample parsing validation, by Adrian Hunter. . Fix up jobserver setup in libtraceevent Makefile. by Arnaldo Carvalho de Melo. . Debug improvements, by Adrian Hunter. . Fix correlation of samples coming after PERF_RECORD_EXIT event, by David Ahern. . Improve robustness of the topology parsing code, by Stephane Eranian. . Add group leader sampling, that allows just one event in a group to sample while the other events have just its values read, by Jiri Olsa. . Add support for a new modifier "D", which requests that the event, or group of events, be pinned to the PMU. By Michael Ellerman. . Support callchain sorting based on addresses, by Andi Kleen . Prep work for multi perf data file storage, by Jiri Olsa. . libtraceevent cleanups, by Namhyung Kim. And lots and lots of other fixes and code reorganizations that did not make it into the list, see the shortlog, diffstat and the Git log for details!" [ Also merge a leftover from the 3.11 cycle ] * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Prevent race in unthrottling code * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (237 commits) perf trace: Tell arg formatters the arg index perf trace: Add beautifier for open's flags arg perf trace: Add beautifier for lseek's whence arg perf tools: Fix symbol offset computation for some dsos perf list: Skip unsupported events perf tests: Add 'keep tracking' test perf tools: Add support for PERF_COUNT_SW_DUMMY perf: Add a dummy software event to keep tracking perf trace: Add beautifier for futex 'operation' parm perf trace: Allow syscall arg formatters to mask args perf: Convert kmalloc_node(...GFP_ZERO...) to kzalloc_node() perf: Export struct perf_branch_entry to userspace perf: Add attr->mmap2 attribute to an event perf/x86: Add Silvermont (22nm Atom) support perf/x86: use INTEL_UEVENT_EXTRA_REG to define MSR_OFFCORE_RSP_X perf trace: Handle missing HUGEPAGE defines perf trace: Honor target pid / tid options when analyzing a file perf trace: Add option to analyze events in a file versus live perf evlist: Add tracepoint lookup by name perf tests: Add a sample parsing test ...
2013-09-02perf: Convert kmalloc_node(...GFP_ZERO...) to kzalloc_node()Joe Perches
Use the convenience function instead of __GFP_ZERO. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/f58599ae1a8d7b32d37e9cf283e95fba6452f7f6.1377809875.git.joe@perches.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-02perf/x86: Add Silvermont (22nm Atom) supportYan, Zheng
Compared to old atom, Silvermont has offcore and has more events that support PEBS. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1374138144-17278-2-git-send-email-zheng.z.yan@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-02perf/x86: use INTEL_UEVENT_EXTRA_REG to define MSR_OFFCORE_RSP_XYan, Zheng
Silvermont (22nm Atom) has two offcore response configuration MSRs, unlike other Intel CPU, its event code for MSR_OFFCORE_RSP_1 is 0x02b7. To avoid complicating intel_fixup_er(), use INTEL_UEVENT_EXTRA_REG to define MSR_OFFCORE_RSP_X. So intel_fixup_er() can find the event code for OFFCORE_RSP_N by x86_pmu.extra_regs[N].event. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1374138144-17278-1-git-send-email-zheng.z.yan@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-01Introduce [compat_]save_altstack_ex() to unbreak x86 SMAPAl Viro
For performance reasons, when SMAP is in use, SMAP is left open for an entire put_user_try { ... } put_user_catch(); block, however, calling __put_user() in the middle of that block will close SMAP as the STAC..CLAC constructs intentionally do not nest. Furthermore, using __put_user() rather than put_user_ex() here is bad for performance. Thus, introduce new [compat_]save_altstack_ex() helpers that replace __[compat_]save_altstack() for x86, being currently the only architecture which supports put_user_try { ... } put_user_catch(). Reported-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: <stable@vger.kernel.org> # v3.8+ Link: http://lkml.kernel.org/n/tip-es5p6y64if71k8p5u08agv9n@git.kernel.org
2013-08-29Merge branch 'linus' into perf/coreIngo Molnar
Pick up the latest upstream fixes. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-26Merge branch 'acpi-sleep'Rafael J. Wysocki
* acpi-sleep: x86 / tboot / ACPI: Fail extended mode reduced hardware sleep xen / ACPI: notify xen when reduced hardware sleep is available ACPI / sleep: Introduce acpi_os_prepare_extended_sleep() for extended sleep path
2013-08-26x86/ioapic: Check attr against the previous setting when programmed more ↵Liu Ping Fan
than once When programming ioapic pinX more than once, current code does not check whether the later attr (trigger & polarity) is the same as the former or not. This causes broken semantics which can be observed in a qemu q35 machine, where ioapic's ioredtbl[x] can never be set as low-active, even if the hpet driver registered it. And hpet driver may share a high-level active IRQ line with other devices. So in qemu, when hpet-dev asserts low-level as kernel expects, the kernel has no response. With this patch, we can observe an ioredtbl[x] set as low-active for hpet. Fix it by reporting -EBUSY to the caller, when attr is different. Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com> Cc: Kevin Hao <haokexin@gmail.com> Cc: Len Brown <len.brown@intel.com> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1377248327-19633-1-git-send-email-pingfank@linux.vnet.ibm.com [ Made small readability edits to both the changelog and the code. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-22x86 get_unmapped_area: Access mmap_legacy_base through mm_struct memberRadu Caragea
This is the updated version of df54d6fa5427 ("x86 get_unmapped_area(): use proper mmap base for bottom-up direction") that only randomizes the mmap base address once. Signed-off-by: Radu Caragea <sinaelgl@gmail.com> Reported-and-tested-by: Jeff Shorey <shoreyjeff@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michel Lespinasse <walken@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Adrian Sendroiu <molecula2788@gmail.com> Cc: Greg KH <greg@kroah.com> Cc: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-08-22Revert "x86 get_unmapped_area(): use proper mmap base for bottom-up direction"Linus Torvalds
This reverts commit df54d6fa54275ce59660453e29d1228c2b45a826. The commit isn't necessarily wrong, but because it recalculates the random mmap_base every time, it seems to confuse user memory allocators that expect contiguous mmap allocations even when the mmap address isn't specified. In particular, the MATLAB Java runtime seems to be unhappy. See https://bugzilla.kernel.org/show_bug.cgi?id=60774 So we'll want to apply the random offset only once, and Radu has a patch for that. Revert this older commit in order to apply the other one. Reported-by: Jeff Shorey <shoreyjeff@gmail.com> Cc: Radu Caragea <sinaelgl@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-08-21Merge tag 'tegra-for-3.12-soc' of ↵Kevin Hilman
git://git.kernel.org/pub/scm/linux/kernel/git/swarren/linux-tegra into next/soc From: Stephen Warren: ARM: tegra: core SoC enhancements for 3.12 This branch includes a number of enhancements to core SoC support for Tegra devices. The major new features are: * Adds a new CPU-power-gated cpuidle state for Tegra114. * Adds initial system suspend support for Tegra114, initially supporting just CPU-power-gating during suspend. * Adds "LP1" suspend mode support for all of Tegra20/30/114. This mode both gates CPU power, and places the DRAM into self-refresh mode. * A new DT-driven PCIe driver to Tegra20/30. The driver is also moved from arch/arm/mach-tegra/ to drivers/pci/host/. The PCIe driver work depends on the following tag from Thomas Petazzoni: git://git.infradead.org/linux-mvebu.git mis-3.12.2 ... which is merged into the middle of this pull request. * tag 'tegra-for-3.12-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/swarren/linux-tegra: (33 commits) ARM: tegra: disable LP2 cpuidle state if PCIe is enabled MAINTAINERS: Add myself as Tegra PCIe maintainer PCI: tegra: set up PADS_REFCLK_CFG1 PCI: tegra: Add Tegra 30 PCIe support PCI: tegra: Move PCIe driver to drivers/pci/host PCI: msi: add default MSI operations for !HAVE_GENERIC_HARDIRQS platforms ARM: tegra: add LP1 suspend support for Tegra114 ARM: tegra: add LP1 suspend support for Tegra20 ARM: tegra: add LP1 suspend support for Tegra30 ARM: tegra: add common LP1 suspend support clk: tegra114: add LP1 suspend/resume support ARM: tegra: config the polarity of the request of sys clock ARM: tegra: add common resume handling code for LP1 resuming ARM: pci: add ->add_bus() and ->remove_bus() hooks to hw_pci of: pci: add registry of MSI chips PCI: Introduce new MSI chip infrastructure PCI: remove ARCH_SUPPORTS_MSI kconfig option PCI: use weak functions for MSI arch-specific functions ARM: tegra: unify Tegra's Kconfig a bit more ARM: tegra: remove the limitation that Tegra114 can't support suspend ... Signed-off-by: Kevin Hilman <khilman@linaro.org>
2013-08-20x86/ioapic/kcrash: Prevent crash_kexec() from deadlocking on ioapic_lockYoshihiro YUNOMAE
Prevent crash_kexec() from deadlocking on ioapic_lock. When crash_kexec() is executed on a CPU, the CPU will take ioapic_lock in disable_IO_APIC(). So if the cpu gets an NMI while locking ioapic_lock, a deadlock will happen. In this patch, ioapic_lock is zapped/initialized before disable_IO_APIC(). You can reproduce this deadlock the following way: 1. Add mdelay(1000) after raw_spin_lock_irqsave() in native_ioapic_set_affinity()@arch/x86/kernel/apic/io_apic.c Although the deadlock can occur without this modification, it will increase the potential of the deadlock problem. 2. Build and install the kernel 3. Set up the OS which will run panic() and kexec when NMI is injected # echo "kernel.unknown_nmi_panic=1" >> /etc/sysctl.conf # vim /etc/default/grub add "nmi_watchdog=0 crashkernel=256M" in GRUB_CMDLINE_LINUX line # grub2-mkconfig 4. Reboot the OS 5. Run following command for each vcpu on the guest # while true; do echo <CPU num> > /proc/irq/<IO-APIC-edge or IO-APIC-fasteoi>/smp_affinitity; done; By running this command, cpus will get ioapic_lock for setting affinity. 6. Inject NMI (push a dump button or execute 'virsh inject-nmi <domain>' if you use VM). After injecting NMI, panic() is called in an nmi-handler context. Then, kexec will normally run in panic(), but the operation will be stopped by deadlock on ioapic_lock in crash_kexec()->machine_crash_shutdown()-> native_machine_crash_shutdown()->disable_IO_APIC()->clear_IO_APIC()-> clear_IO_APIC_pin()->ioapic_read_entry(). Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: yrl.pp-manager.tt@hitachi.com Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Link: http://lkml.kernel.org/r/20130820070107.28245.83806.stgit@yunodevel Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-19Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Two AMD microcode loader fixes and an OLPC firmware support fix" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, microcode, AMD: Fix early microcode loading x86, microcode, AMD: Make cpu_has_amd_erratum() use the correct struct cpuinfo_x86 x86: Don't clear olpc_ofw_header when sentinel is detected
2013-08-19x86/kvm/guest: Fix sparse warning: "symbol 'klock_waiting' was not declared ↵Raghavendra K T
as static" It was not declared as static since it was thought to be used by pv-flushtlb earlier. Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Cc: <gleb@redhat.com> Cc: <pbonzini@redhat.com> Cc: Jiri Kosina <trivial@kernel.org> Link: http://lkml.kernel.org/r/1376645921-8056-1-git-send-email-raghavendra.kt@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-16perf/x86/intel/uncore: Enable EV_SEL_EXT bit for PCUYan, Zheng
This patch adds support for the SNB-EP PCU uncore PMU extra_sel_bit (bit 21) which is missing from the documentation in Table-2.75 of Intel Xeon Processor E5-2600 Product Family Uncore Performance Monitoring Guide. It is referred to later in Table-2.81. Without this selection bit explicitly enabled by the kernel, some events such as COREx_TRANSITION_CYCLES do not count correctly. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1376375382-21350-4-git-send-email-zheng.z.yan@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-16perf/x86/intel/uncore: Add filter support for QPI boxesYan, Zheng
The QPI uncore boxes have two pairs of MATCH/MASK registers that user to filter packet traffic serviced by QPI link layer. These registers are in auxiliary PCI devices. This patch adds the auxiliary PCI devices to snbep_uncore_pci_ids and adds field definitions for the MATCH/MASK registers. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1375856245-10717-2-git-send-email-zheng.z.yan@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-16perf/x86/intel/uncore: Add auxiliary pci device supportYan, Zheng
The QPI uncore boxes have two pairs of MATCH/MASK registers that user to filter packet traffic serviced by QPI link layer. These registers are in auxiliary PCI devices. This patch changes the meaning of (struct pci_device_id)->driver_data. The first 8 bits are device index of the same uncore type, the second 8 bytes are uncore type index. Auxiliary PCI device's type is defined as UNCORE_EXTRA_PCI_DEV(0xff) Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1375856245-10717-1-git-send-email-zheng.z.yan@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-15Merge tag 'v3.11-rc5' into perf/coreIngo Molnar
Merge Linux 3.11-rc5, to sync up with the latest upstream fixes since -rc1. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-14Merge branch 'akpm' (patches from Andrew Morton)Linus Torvalds
Merge a bunch of fixes from Andrew Morton. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: fs/proc/task_mmu.c: fix buffer overflow in add_page_map() arch: *: Kconfig: add "kernel/Kconfig.freezer" to "arch/*/Kconfig" ocfs2: fix null pointer dereference in ocfs2_dir_foreach_blk_id() x86 get_unmapped_area(): use proper mmap base for bottom-up direction ocfs2: fix NULL pointer dereference in ocfs2_duplicate_clusters_by_page ocfs2: Revert 40bd62e to avoid regression in extended allocation drivers/rtc/rtc-stmp3xxx.c: provide timeout for potentially endless loop polling a HW bit hugetlb: fix lockdep splat caused by pmd sharing aoe: adjust ref of head for compound page tails microblaze: fix clone syscall mm: save soft-dirty bits on file pages mm: save soft-dirty bits on swapped pages memcg: don't initialize kmem-cache destroying work for root caches
2013-08-14kvm: Paravirtual ticketlocks support for linux guests running on KVM hypervisorSrivatsa Vaddagiri
During smp_boot_cpus paravirtualied KVM guest detects if the hypervisor has required feature (KVM_FEATURE_PV_UNHALT) to support pv-ticketlocks. If so, support for pv-ticketlocks is registered via pv_lock_ops. Use KVM_HC_KICK_CPU hypercall to wakeup waiting/halted vcpu. Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20130810193849.GA25260@linux.vnet.ibm.com Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com> [Raghu: check_zero race fix, enum for kvm_contention_stat, jumplabel related changes, addition of safe_halt for irq enabled case, bailout spinning in nmi case(Gleb)] Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Acked-by: Gleb Natapov <gleb@redhat.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-14Merge tag 'amd_f15_m30' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/ras Pull AMD F15h, model 0x30 and later enablement stuff, more specifically EDAC support, from Borislav Petkov. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-14x86: avoid remapping data in parse_setup_data()Linn Crosetto
Type SETUP_PCI, added by setup_efi_pci(), may advertise a ROM size larger than early_memremap() is able to handle, which is currently limited to 256kB. If this occurs it leads to a NULL dereference in parse_setup_data(). To avoid this, remap the setup_data header and allow parsing functions for individual types to handle their own data remapping. Signed-off-by: Linn Crosetto <linn@hp.com> Link: http://lkml.kernel.org/r/1376430401-67445-1-git-send-email-linn@hp.com Acked-by: Yinghai Lu <yinghai@kernel.org> Reviewed-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-14x86: Use memblock_set_current_limit() to set limit for memblock.Tang Chen
In setup_arch() of x86, it set memblock.current_limit directly. We should use memblock_set_current_limit(). If the implementation is changed, it is easy to maintain. Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Link: http://lkml.kernel.org/r/1376451844-15682-1-git-send-email-tangchen@cn.fujitsu.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-14x86 get_unmapped_area(): use proper mmap base for bottom-up directionRadu Caragea
When the stack is set to unlimited, the bottomup direction is used for mmap-ings but the mmap_base is not used and thus effectively renders ASLR for mmapings along with PIE useless. Cc: Michel Lespinasse <walken@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Adrian Sendroiu <molecula2788@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-08-13Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Two small fixlets" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86: Add Haswell ULT model number used in Macbook Air and other systems perf/x86: Fix intel QPI uncore event definitions
2013-08-12Merge tag 'please-pull-mce-f-bit' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras Pull MCE-uncorrected-error fix from Tony Luck: "Bit 12 may or may not be set in MCi_STATUS.MCACOD when an uncorrected error is reported. Ignore it when checking error signatures." Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-12x86, microcode, AMD: Fix early microcode loadingTorsten Kaiser
load_microcode_amd() (and the helper it is using) should not have an cpu parameter. The microcode loading does not depend on the CPU wrt the patches loaded since they will end up in a global list for all CPUs anyway. The change from cpu to x86family in load_microcode_amd() now allows to drop the code messing with cpu_data(cpu) from collect_cpu_info_amd_early(), which is wrong anyway because at that point the per-cpu cpu_info is not yet setup (These values would later be overwritten by smp_store_boot_cpu_info() / smp_store_cpu_info()). Fold the rest of collect_cpu_info_amd_early() into load_ucode_amd_ap(), because its only used at one place and without the cpuinfo_x86 accesses it was not much left. Signed-off-by: Torsten Kaiser <just.for.lkml@googlemail.com> [ Fengguang: build fix ] Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> [ Boris: adapt it to current tree. ] Signed-off-by: Borislav Petkov <bp@suse.de>
2013-08-12x86, microcode, AMD: Make cpu_has_amd_erratum() use the correct struct ↵Torsten Kaiser
cpuinfo_x86 cpu_has_amd_erratum() is buggy, because it uses the per-cpu cpu_info before it is filled by smp_store_boot_cpu_info() / smp_store_cpu_info(). If early microcode loading is enabled its collect_cpu_info_amd_early() will fill ->x86 and so the fallback to boot_cpu_data is not used. But ->x86_vendor was not filled and is still X86_VENDOR_INTEL resulting in no errata fixes getting applied and my system hangs on boot. Using cpu_info in cpu_has_amd_erratum() is wrong anyway: its only caller init_amd() will have a struct cpuinfo_x86 as parameter and the set_cpu_bug() that is controlled by cpu_has_amd_erratum() also only uses that struct. So pass the struct cpuinfo_x86 from init_amd() to cpu_has_amd_erratum() and the broken fallback can be dropped. [ Boris: Drop WARN_ON() since we're called only from init_amd() ] Signed-off-by: Torsten Kaiser <just.for.lkml@googlemail.com> Signed-off-by: Borislav Petkov <bp@suse.de>
2013-08-12Merge branch 'x86/mce' into x86/rasIngo Molnar
Pursue a single RAS/MCE topic branch on x86. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-12PCI: use weak functions for MSI arch-specific functionsThomas Petazzoni
Until now, the MSI architecture-specific functions could be overloaded using a fairly complex set of #define and compile-time conditionals. In order to prepare for the introduction of the msi_chip infrastructure, it is desirable to switch all those functions to use the 'weak' mechanism. This commit converts all the architectures that were overidding those MSI functions to use the new strategy. Note that we keep two separate, non-weak, functions default_teardown_msi_irqs() and default_restore_msi_irqs() for the default behavior of the arch_teardown_msi_irqs() and arch_restore_msi_irqs(), as the default behavior is needed by x86 PCI code. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Tested-by: Daniel Price <daniel.price@gmail.com> Tested-by: Thierry Reding <thierry.reding@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: linux390@de.ibm.com Cc: linux-s390@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: x86@kernel.org Cc: Russell King <linux@arm.linux.org.uk> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: linux-ia64@vger.kernel.org Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: David S. Miller <davem@davemloft.net> Cc: sparclinux@vger.kernel.org Cc: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Jason Cooper <jason@lakedaemon.net>
2013-08-12x86, amd_nb: Clarify F15h, model 30h GART and L3 supportAravind Gopalakrishnan
F15h, models 0x30 and later don't have a GART. Note that. Also check CPUID leaf 0x80000006 for L3 prescence because there are models which don't sport an L3 cache. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> [ Boris: rewrite commit message, cleanup comments. ] Signed-off-by: Borislav Petkov <bp@suse.de>
2013-08-12perf/x86: Add Haswell ULT model number used in Macbook Air and other systemsAndi Kleen
This one was missed earlier. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1376007983-31616-1-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-09x86, ticketlock: Add slowpath logicJeremy Fitzhardinge
Maintain a flag in the LSB of the ticket lock tail which indicates whether anyone is in the lock slowpath and may need kicking when the current holder unlocks. The flags are set when the first locker enters the slowpath, and cleared when unlocking to an empty queue (ie, no contention). In the specific implementation of lock_spinning(), make sure to set the slowpath flags on the lock just before blocking. We must do this before the last-chance pickup test to prevent a deadlock with the unlocker: Unlocker Locker test for lock pickup -> fail unlock test slowpath -> false set slowpath flags block Whereas this works in any ordering: Unlocker Locker set slowpath flags test for lock pickup -> fail block unlock test slowpath -> true, kick If the unlocker finds that the lock has the slowpath flag set but it is actually uncontended (ie, head == tail, so nobody is waiting), then it clears the slowpath flag. The unlock code uses a locked add to update the head counter. This also acts as a full memory barrier so that its safe to subsequently read back the slowflag state, knowing that the updated lock is visible to the other CPUs. If it were an unlocked add, then the flag read may just be forwarded from the store buffer before it was visible to the other CPUs, which could result in a deadlock. Unfortunately this means we need to do a locked instruction when unlocking with PV ticketlocks. However, if PV ticketlocks are not enabled, then the old non-locked "add" is the only unlocking code. Note: this code relies on gcc making sure that unlikely() code is out of line of the fastpath, which only happens when OPTIMIZE_SIZE=n. If it doesn't the generated code isn't too bad, but its definitely suboptimal. Thanks to Srivatsa Vaddagiri for providing a bugfix to the original version of this change, which has been folded in. Thanks to Stephan Diestelhorst for commenting on some code which relied on an inaccurate reading of the x86 memory ordering rules. Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org> Link: http://lkml.kernel.org/r/1376058122-8248-11-git-send-email-raghavendra.kt@linux.vnet.ibm.com Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Stephan Diestelhorst <stephan.diestelhorst@amd.com> Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-09x86, pvticketlock: Use callee-save for lock_spinningJeremy Fitzhardinge
Although the lock_spinning calls in the spinlock code are on the uncommon path, their presence can cause the compiler to generate many more register save/restores in the function pre/postamble, which is in the fast path. To avoid this, convert it to using the pvops callee-save calling convention, which defers all the save/restores until the actual function is called, keeping the fastpath clean. Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org> Link: http://lkml.kernel.org/r/1376058122-8248-8-git-send-email-raghavendra.kt@linux.vnet.ibm.com Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: Attilio Rao <attilio.rao@citrix.com> Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-09x86, spinlock: Replace pv spinlocks with pv ticketlocksJeremy Fitzhardinge
Rather than outright replacing the entire spinlock implementation in order to paravirtualize it, keep the ticket lock implementation but add a couple of pvops hooks on the slow patch (long spin on lock, unlocking a contended lock). Ticket locks have a number of nice properties, but they also have some surprising behaviours in virtual environments. They enforce a strict FIFO ordering on cpus trying to take a lock; however, if the hypervisor scheduler does not schedule the cpus in the correct order, the system can waste a huge amount of time spinning until the next cpu can take the lock. (See Thomas Friebel's talk "Prevent Guests from Spinning Around" http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.) To address this, we add two hooks: - __ticket_spin_lock which is called after the cpu has been spinning on the lock for a significant number of iterations but has failed to take the lock (presumably because the cpu holding the lock has been descheduled). The lock_spinning pvop is expected to block the cpu until it has been kicked by the current lock holder. - __ticket_spin_unlock, which on releasing a contended lock (there are more cpus with tail tickets), it looks to see if the next cpu is blocked and wakes it if so. When compiled with CONFIG_PARAVIRT_SPINLOCKS disabled, a set of stub functions causes all the extra code to go away. Results: ======= setup: 32 core machine with 32 vcpu KVM guest (HT off) with 8GB RAM base = 3.11-rc patched = base + pvspinlock V12 +-----------------+----------------+--------+ dbench (Throughput in MB/sec. Higher is better) +-----------------+----------------+--------+ | base (stdev %)|patched(stdev%) | %gain | +-----------------+----------------+--------+ | 15035.3 (0.3) |15150.0 (0.6) | 0.8 | | 1470.0 (2.2) | 1713.7 (1.9) | 16.6 | | 848.6 (4.3) | 967.8 (4.3) | 14.0 | | 652.9 (3.5) | 685.3 (3.7) | 5.0 | +-----------------+----------------+--------+ pvspinlock shows benefits for overcommit ratio > 1 for PLE enabled cases, and undercommits results are flat Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org> Link: http://lkml.kernel.org/r/1376058122-8248-2-git-send-email-raghavendra.kt@linux.vnet.ibm.com Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: Attilio Rao <attilio.rao@citrix.com> [ Raghavendra: Changed SPIN_THRESHOLD, fixed redefinition of arch_spinlock_t] Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-06x86, asmlinkage, paravirt: Add __visible/asmlinkage to xen paravirt opsAndi Kleen
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1375740170-7446-13-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>