summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2015-10-10Merge branch 'stable/for-linus-4.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb Pull swiotlb fixlet from Konrad Rzeszutek Wilk: "Enable the SWIOTLB under 32-bit PAE kernels. Nowadays most distros enable this due to CONFIG_HYPERVISOR|XEN=y which select SWIOTLB. But for those that are not interested in virtualization and wanting to use 32-bit PAE kernels and wanting to have working DMA operations - this configures it for them" * 'stable/for-linus-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb: swiotlb: Enable it under x86 PAE
2015-10-07swiotlb: Enable it under x86 PAEChristian Melki
Most distributions end up enabling SWIOTLB already with 32-bit kernels due to the combination of CONFIG_HYPERVISOR_GUEST|CONFIG_XEN=y as those end up requiring the SWIOTLB. However for those that are not interested in virtualization and run in 32-bit they will discover that: "32-bit PAE 4.2.0 kernel (no IOMMU code) would hang when writing to my USB disk. The kernel spews million(-ish messages per sec) to syslog, effectively "hanging" userspace with my kernel. Oct 2 14:33:06 voodoochild kernel: [ 223.287447] nommu_map_sg: overflow 25dcac000+1024 of device mask ffffffff Oct 2 14:33:06 voodoochild kernel: [ 223.287448] nommu_map_sg: overflow 25dcac000+1024 of device mask ffffffff Oct 2 14:33:06 voodoochild kernel: [ 223.287449] nommu_map_sg: overflow 25dcac000+1024 of device mask ffffffff ... etc ..." Enabling it makes the problem go away. N.B. With a6dfa128ce5c414ab46b1d690f7a1b8decb8526d "config: Enable NEED_DMA_MAP_STATE by default when SWIOTLB is selected" we also have the important part of the SG macros enabled to make this work properly - in case anybody wants to backport this patch. Reported-and-Tested-by: Christian Melki <christian.melki@t2data.com> Signed-off-by: Christian Melki <christian.melki@t2data.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2015-10-06Merge tag 'for-linus-4.3b-rc4-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen bug fixes from David Vrabel: - Fix VM save performance regression with x86 PV guests - Make kexec work in x86 PVHVM guests (if Xen has the soft-reset ABI) - Other minor fixes. * tag 'for-linus-4.3b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen/p2m: hint at the last populated P2M entry x86/xen: Do not clip xen_e820_map to xen_e820_map_entries when sanitizing map x86/xen: Support kexec/kdump in HVM guests by doing a soft reset xen/x86: Don't try to write syscall-related MSRs for PV guests xen: use correct type for HYPERVISOR_memory_op()
2015-10-06x86/xen/p2m: hint at the last populated P2M entryDavid Vrabel
With commit 633d6f17cd91ad5bf2370265946f716e42d388c6 (x86/xen: prepare p2m list for memory hotplug) the P2M may be sized to accomdate a much larger amount of memory than the domain currently has. When saving a domain, the toolstack must scan all the P2M looking for populated pages. This results in a performance regression due to the unnecessary scanning. Instead of reporting (via shared_info) the maximum possible size of the P2M, hint at the last PFN which might be populated. This hint is increased as new leaves are added to the P2M (in the expectation that they will be used for populated entries). Signed-off-by: David Vrabel <david.vrabel@citrix.com> Cc: <stable@vger.kernel.org> # 4.0+
2015-10-03Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Fixes all around the map: W+X kernel mapping fix, WCHAN fixes, two build failure fixes for corner case configs, x32 header fix and a speling fix" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/headers/uapi: Fix __BITS_PER_LONG value for x32 builds x86/mm: Set NX on gap between __ex_table and rodata x86/kexec: Fix kexec crash in syscall kexec_file_load() x86/process: Unify 32bit and 64bit implementations of get_wchan() x86/process: Add proper bound checks in 64bit get_wchan() x86, efi, kasan: Fix build failure on !KASAN && KMEMCHECK=y kernels x86/hyperv: Fix the build in the !CONFIG_KEXEC_CORE case x86/cpufeatures: Correct spelling of the HWP_NOTIFY flag
2015-10-03Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI fixes from Ingo Molnar: "Two EFI fixes: one for x86, one for ARM, fixing a boot crash bug that can trigger under newer EFI firmware" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: arm64/efi: Fix boot crash by not padding between EFI_MEMORY_RUNTIME regions x86/efi: Fix boot crash by mapping EFI memmap entries bottom-up at runtime, instead of top-down
2015-10-02x86/headers/uapi: Fix __BITS_PER_LONG value for x32 buildsBen Hutchings
On x32, gcc predefines __x86_64__ but long is only 32-bit. Use __ILP32__ to distinguish x32. Fixes this compiler error in perf: tools/include/asm-generic/bitops/__ffs.h: In function '__ffs': tools/include/asm-generic/bitops/__ffs.h:19:8: error: right shift count >= width of type [-Werror=shift-count-overflow] word >>= 32; ^ This isn't sufficient to build perf for x32, though. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1443660043.2730.15.camel@decadent.org.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-02x86/mm: Set NX on gap between __ex_table and rodataStephen Smalley
Unused space between the end of __ex_table and the start of rodata can be left W+x in the kernel page tables. Extend the setting of the NX bit to cover this gap by starting from text_end rather than rodata_start. Before: ---[ High Kernel Mapping ]--- 0xffffffff80000000-0xffffffff81000000 16M pmd 0xffffffff81000000-0xffffffff81600000 6M ro PSE GLB x pmd 0xffffffff81600000-0xffffffff81754000 1360K ro GLB x pte 0xffffffff81754000-0xffffffff81800000 688K RW GLB x pte 0xffffffff81800000-0xffffffff81a00000 2M ro PSE GLB NX pmd 0xffffffff81a00000-0xffffffff81b3b000 1260K ro GLB NX pte 0xffffffff81b3b000-0xffffffff82000000 4884K RW GLB NX pte 0xffffffff82000000-0xffffffff82200000 2M RW PSE GLB NX pmd 0xffffffff82200000-0xffffffffa0000000 478M pmd After: ---[ High Kernel Mapping ]--- 0xffffffff80000000-0xffffffff81000000 16M pmd 0xffffffff81000000-0xffffffff81600000 6M ro PSE GLB x pmd 0xffffffff81600000-0xffffffff81754000 1360K ro GLB x pte 0xffffffff81754000-0xffffffff81800000 688K RW GLB NX pte 0xffffffff81800000-0xffffffff81a00000 2M ro PSE GLB NX pmd 0xffffffff81a00000-0xffffffff81b3b000 1260K ro GLB NX pte 0xffffffff81b3b000-0xffffffff82000000 4884K RW GLB NX pte 0xffffffff82000000-0xffffffff82200000 2M RW PSE GLB NX pmd 0xffffffff82200000-0xffffffffa0000000 478M pmd Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: Kees Cook <keescook@chromium.org> Cc: <stable@vger.kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1443704662-3138-1-git-send-email-sds@tycho.nsa.gov Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-02x86/kexec: Fix kexec crash in syscall kexec_file_load()Lee, Chun-Yi
The original bug is a page fault crash that sometimes happens on big machines when preparing ELF headers: BUG: unable to handle kernel paging request at ffffc90613fc9000 IP: [<ffffffff8103d645>] prepare_elf64_ram_headers_callback+0x165/0x260 The bug is caused by us under-counting the number of memory ranges and subsequently not allocating enough ELF header space for them. The bug is typically masked on smaller systems, because the ELF header allocation is rounded up to the next page. This patch modifies the code in fill_up_crash_elf_data() by using walk_system_ram_res() instead of walk_system_ram_range() to correctly count the max number of crash memory ranges. That's because the walk_system_ram_range() filters out small memory regions that reside in the same page, but walk_system_ram_res() does not. Here's how I found the bug: After tracing prepare_elf64_headers() and prepare_elf64_ram_headers_callback(), the code uses walk_system_ram_res() to fill-in crash memory regions information to the program header, so it counts those small memory regions that reside in a page area. But, when the kernel was using walk_system_ram_range() in fill_up_crash_elf_data() to count the number of crash memory regions, it filters out small regions. I printed those small memory regions, for example: kexec: Get nr_ram ranges. vaddr=0xffff880077592258 paddr=0x77592258, sz=0xdc0 Based on the code in walk_system_ram_range(), this memory region will be filtered out: pfn = (0x77592258 + 0x1000 - 1) >> 12 = 0x77593 end_pfn = (0x77592258 + 0xfc0 -1 + 1) >> 12 = 0x77593 end_pfn - pfn = 0x77593 - 0x77593 = 0 <=== if (end_pfn > pfn) is FALSE So, the max_nr_ranges that's counted by the kernel doesn't include small memory regions - causing us to under-allocate the required space. That causes the page fault crash that happens in a later code path when preparing ELF headers. This bug is not easy to reproduce on small machines that have few CPUs, because the allocated page aligned ELF buffer has more free space to cover those small memory regions' PT_LOAD headers. Signed-off-by: Lee, Chun-Yi <jlee@suse.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Takashi Iwai <tiwai@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: kexec@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/1443531537-29436-1-git-send-email-jlee@suse.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-02Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "12 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: dmapool: fix overflow condition in pool_find_page() thermal: avoid division by zero in power allocator memcg: remove pcp_counter_lock kprobes: use _do_fork() in samples to make them work again drivers/input/joystick/Kconfig: zhenhua.c needs BITREVERSE memcg: make mem_cgroup_read_stat() unsigned memcg: fix dirty page migration dax: fix NULL pointer in __dax_pmd_fault() mm: hugetlbfs: skip shared VMAs when unmapping private pages to satisfy a fault mm/slab: fix unexpected index mapping result of kmalloc_size(INDEX_NODE+1) userfaultfd: remove kernel header include from uapi header arch/x86/include/asm/efi.h: fix build failure
2015-10-02arch/x86/include/asm/efi.h: fix build failureAndrey Ryabinin
With KMEMCHECK=y, KASAN=n: arch/x86/platform/efi/efi.c:673:3: error: implicit declaration of function `memcpy' [-Werror=implicit-function-declaration] arch/x86/platform/efi/efi_64.c:139:2: error: implicit declaration of function `memcpy' [-Werror=implicit-function-declaration] arch/x86/include/asm/desc.h:121:2: error: implicit declaration of function `memcpy' [-Werror=implicit-function-declaration] Don't #undef memcpy if KASAN=n. Fixes: 769a8089c1fd ("x86, efi, kasan: #undef memset/memcpy/memmove per arch") Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Reported-by: Ingo Molnar <mingo@kernel.org> Reported-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-10-01Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Paolo Bonzini: "(Relatively) a lot of reverts, mostly. Bugs have trickled in for a new feature in 4.2 (MTRR support in guests) so I'm reverting it all; let's not make this -rc period busier for KVM than it's been so far. This covers the four reverts from me. The fifth patch is being reverted because Radim found a bug in the implementation of stable scheduler clock, *but* also managed to implement the feature entirely without hypervisor support. So instead of fixing the hypervisor side we can remove it completely; 4.4 will get the new implementation" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: Use WARN_ON_ONCE for missing X86_FEATURE_NRIPS Update KVM homepage Url Revert "KVM: SVM: use NPT page attributes" Revert "KVM: svm: handle KVM_X86_QUIRK_CD_NW_CLEARED in svm_get_mt_mask" Revert "KVM: SVM: Sync g_pat with guest-written PAT value" Revert "KVM: x86: apply guest MTRR virtualization on host reserved pages" Revert "KVM: x86: zero kvmclock_offset when vcpu0 initializes kvmclock system MSR"
2015-10-01Use WARN_ON_ONCE for missing X86_FEATURE_NRIPSDirk Müller
The cpu feature flags are not ever going to change, so warning everytime can cause a lot of kernel log spam (in our case more than 10GB/hour). The warning seems to only occur when nested virtualization is enabled, so it's probably triggered by a KVM bug. This is a sensible and safe change anyway, and the KVM bug fix might not be suitable for stable releases anyway. Cc: stable@vger.kernel.org Signed-off-by: Dirk Mueller <dmueller@suse.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01Revert "KVM: SVM: use NPT page attributes"Paolo Bonzini
This reverts commit 3c2e7f7de3240216042b61073803b61b9b3cfb22. Initializing the mapping from MTRR to PAT values was reported to fail nondeterministically, and it also caused extremely slow boot (due to caching getting disabled---bug 103321) with assigned devices. Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Reported-by: Sebastian Schuette <dracon@ewetel.net> Cc: stable@vger.kernel.org # 4.2+ Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01Revert "KVM: svm: handle KVM_X86_QUIRK_CD_NW_CLEARED in svm_get_mt_mask"Paolo Bonzini
This reverts commit 5492830370171b6a4ede8a3bfba687a8d0f25fa5. It builds on the commit that is being reverted next. Cc: stable@vger.kernel.org # 4.2+ Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01Revert "KVM: SVM: Sync g_pat with guest-written PAT value"Paolo Bonzini
This reverts commit e098223b789b4a618dacd79e5e0dad4a9d5018d1, which has a dependency on other commits being reverted. Cc: stable@vger.kernel.org # 4.2+ Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01Revert "KVM: x86: apply guest MTRR virtualization on host reserved pages"Paolo Bonzini
This reverts commit fd717f11015f673487ffc826e59b2bad69d20fe5. It was reported to cause Machine Check Exceptions (bug 104091). Reported-by: harn-solo@gmx.de Cc: stable@vger.kernel.org # 4.2+ Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01x86/efi: Fix boot crash by mapping EFI memmap entries bottom-up at runtime, ↵Matt Fleming
instead of top-down Beginning with UEFI v2.5 EFI_PROPERTIES_TABLE was introduced that signals that the firmware PE/COFF loader supports splitting code and data sections of PE/COFF images into separate EFI memory map entries. This allows the kernel to map those regions with strict memory protections, e.g. EFI_MEMORY_RO for code, EFI_MEMORY_XP for data, etc. Unfortunately, an unwritten requirement of this new feature is that the regions need to be mapped with the same offsets relative to each other as observed in the EFI memory map. If this is not done crashes like this may occur, BUG: unable to handle kernel paging request at fffffffefe6086dd IP: [<fffffffefe6086dd>] 0xfffffffefe6086dd Call Trace: [<ffffffff8104c90e>] efi_call+0x7e/0x100 [<ffffffff81602091>] ? virt_efi_set_variable+0x61/0x90 [<ffffffff8104c583>] efi_delete_dummy_variable+0x63/0x70 [<ffffffff81f4e4aa>] efi_enter_virtual_mode+0x383/0x392 [<ffffffff81f37e1b>] start_kernel+0x38a/0x417 [<ffffffff81f37495>] x86_64_start_reservations+0x2a/0x2c [<ffffffff81f37582>] x86_64_start_kernel+0xeb/0xef Here 0xfffffffefe6086dd refers to an address the firmware expects to be mapped but which the OS never claimed was mapped. The issue is that included in these regions are relative addresses to other regions which were emitted by the firmware toolchain before the "splitting" of sections occurred at runtime. Needless to say, we don't satisfy this unwritten requirement on x86_64 and instead map the EFI memory map entries in reverse order. The above crash is almost certainly triggerable with any kernel newer than v3.13 because that's when we rewrote the EFI runtime region mapping code, in commit d2f7cbe7b26a ("x86/efi: Runtime services virtual mapping"). For kernel versions before v3.13 things may work by pure luck depending on the fragmentation of the kernel virtual address space at the time we map the EFI regions. Instead of mapping the EFI memory map entries in reverse order, where entry N has a higher virtual address than entry N+1, map them in the same order as they appear in the EFI memory map to preserve this relative offset between regions. This patch has been kept as small as possible with the intention that it should be applied aggressively to stable and distribution kernels. It is very much a bugfix rather than support for a new feature, since when EFI_PROPERTIES_TABLE is enabled we must map things as outlined above to even boot - we have no way of asking the firmware not to split the code/data regions. In fact, this patch doesn't even make use of the more strict memory protections available in UEFI v2.5. That will come later. Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Cc: <stable@vger.kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: Chun-Yi <jlee@suse.com> Cc: Dave Young <dyoung@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: James Bottomley <JBottomley@Odin.com> Cc: Lee, Chun-Yi <jlee@suse.com> Cc: Leif Lindholm <leif.lindholm@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Jones <pjones@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1443218539-7610-2-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-01Merge remote-tracking branch 'tglx/x86/urgent' into x86/urgentIngo Molnar
Pick up the WCHAN fixes from Thomas. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-30x86/process: Unify 32bit and 64bit implementations of get_wchan()Thomas Gleixner
The stack layout and the functionality is identical. Use the 64bit version for all of x86. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@alien8.de> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Kostya Serebryany <kcc@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: kasan-dev <kasan-dev@googlegroups.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Wolfram Gloger <wmglo@dent.med.uni-muenchen.de> Link: http://lkml.kernel.org/r/20150930083302.779694618@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-09-30x86/process: Add proper bound checks in 64bit get_wchan()Thomas Gleixner
Dmitry Vyukov reported the following using trinity and the memory error detector AddressSanitizer (https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel). [ 124.575597] ERROR: AddressSanitizer: heap-buffer-overflow on address ffff88002e280000 [ 124.576801] ffff88002e280000 is located 131938492886538 bytes to the left of 28857600-byte region [ffffffff81282e0a, ffffffff82e0830a) [ 124.578633] Accessed by thread T10915: [ 124.579295] inlined in describe_heap_address ./arch/x86/mm/asan/report.c:164 [ 124.579295] #0 ffffffff810dd277 in asan_report_error ./arch/x86/mm/asan/report.c:278 [ 124.580137] #1 ffffffff810dc6a0 in asan_check_region ./arch/x86/mm/asan/asan.c:37 [ 124.581050] #2 ffffffff810dd423 in __tsan_read8 ??:0 [ 124.581893] #3 ffffffff8107c093 in get_wchan ./arch/x86/kernel/process_64.c:444 The address checks in the 64bit implementation of get_wchan() are wrong in several ways: - The lower bound of the stack is not the start of the stack page. It's the start of the stack page plus sizeof (struct thread_info) - The upper bound must be: top_of_stack - TOP_OF_KERNEL_STACK_PADDING - 2 * sizeof(unsigned long). The 2 * sizeof(unsigned long) is required because the stack pointer points at the frame pointer. The layout on the stack is: ... IP FP ... IP FP. So we need to make sure that both IP and FP are in the bounds. Fix the bound checks and get rid of the mix of numeric constants, u64 and unsigned long. Making all unsigned long allows us to use the same function for 32bit as well. Use READ_ONCE() when accessing the stack. This does not prevent a concurrent wakeup of the task and the stack changing, but at least it avoids TOCTOU. Also check task state at the end of the loop. Again that does not prevent concurrent changes, but it avoids walking for nothing. Add proper comments while at it. Reported-by: Dmitry Vyukov <dvyukov@google.com> Reported-by: Sasha Levin <sasha.levin@oracle.com> Based-on-patch-from: Wolfram Gloger <wmglo@dent.med.uni-muenchen.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@alien8.de> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Kostya Serebryany <kcc@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: kasan-dev <kasan-dev@googlegroups.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Wolfram Gloger <wmglo@dent.med.uni-muenchen.de> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20150930083302.694788319@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-09-30x86, efi, kasan: Fix build failure on !KASAN && KMEMCHECK=y kernelsAndrey Ryabinin
With KMEMCHECK=y, KASAN=n we get this build failure: arch/x86/platform/efi/efi.c:673:3: error: implicit declaration of function ‘memcpy’ [-Werror=implicit-function-declaration] arch/x86/platform/efi/efi_64.c:139:2: error: implicit declaration of function ‘memcpy’ [-Werror=implicit-function-declaration] arch/x86/include/asm/desc.h:121:2: error: implicit declaration of function ‘memcpy’ [-Werror=implicit-function-declaration] Don't #undef memcpy if KASAN=n. Reported-by: Ingo Molnar <mingo@kernel.org> Reported-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt.fleming@intel.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 769a8089c1fd ("x86, efi, kasan: #undef memset/memcpy/memmove per arch") Link: http://lkml.kernel.org/r/1443544814-20122-1-git-send-email-ryabinin.a.a@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-30Merge tag 'v4.3-rc3' into x86/urgent, before applying dependent fixIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-30x86/hyperv: Fix the build in the !CONFIG_KEXEC_CORE caseVitaly Kuznetsov
Recent changes in the Hyper-V driver: b4370df2b1f5 ("Drivers: hv: vmbus: add special crash handler") broke the build when CONFIG_KEXEC_CORE is not set: arch/x86/built-in.o: In function `hv_machine_crash_shutdown': arch/x86/kernel/cpu/mshyperv.c:112: undefined reference to `native_machine_crash_shutdown' Decorate all kexec related code with #ifdef CONFIG_KEXEC_CORE. Reported-by: Jim Davis <jim.epost@gmail.com> Reported-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: devel@linuxdriverproject.org Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1443002577-25370-1-git-send-email-vkuznets@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-28x86/xen: Do not clip xen_e820_map to xen_e820_map_entries when sanitizing mapMalcolm Crossley
Sanitizing the e820 map may produce extra E820 entries which would result in the topmost E820 entries being removed. The removed entries would typically include the top E820 usable RAM region and thus result in the domain having signicantly less RAM available to it. Fix by allowing sanitize_e820_map to use the full size of the allocated E820 array. Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-09-28x86/xen: Support kexec/kdump in HVM guests by doing a soft resetVitaly Kuznetsov
Currently there is a number of issues preventing PVHVM Xen guests from doing successful kexec/kdump: - Bound event channels. - Registered vcpu_info. - PIRQ/emuirq mappings. - shared_info frame after XENMAPSPACE_shared_info operation. - Active grant mappings. Basically, newly booted kernel stumbles upon already set up Xen interfaces and there is no way to reestablish them. In Xen-4.7 a new feature called 'soft reset' is coming. A guest performing kexec/kdump operation is supposed to call SCHEDOP_shutdown hypercall with SHUTDOWN_soft_reset reason before jumping to new kernel. Hypervisor (with some help from toolstack) will do full domain cleanup (but keeping its memory and vCPU contexts intact) returning the guest to the state it had when it was first booted and thus allowing it to start over. Doing SHUTDOWN_soft_reset on Xen hypervisors which don't support it is probably OK as by default all unknown shutdown reasons cause domain destroy with a message in toolstack log: 'Unknown shutdown reason code 5. Destroying domain.' which gives a clue to what the problem is and eliminates false expectations. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-09-28xen/x86: Don't try to write syscall-related MSRs for PV guestsBoris Ostrovsky
For PV guests these registers are set up by hypervisor and thus should not be written by the guest. The comment in xen_write_msr_safe() says so but we still write the MSRs, causing the hypervisor to print a warning. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-09-28xen: use correct type for HYPERVISOR_memory_op()Juergen Gross
HYPERVISOR_memory_op() is defined to return an "int" value. This is wrong, as the Xen hypervisor will return "long". The sub-function XENMEM_maximum_reservation returns the maximum number of pages for the current domain. An int will overflow for a domain configured with 8TB of memory or more. Correct this by using the correct type. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-09-28Revert "KVM: x86: zero kvmclock_offset when vcpu0 initializes kvmclock ↵Radim Krčmář
system MSR" Shifting pvclock_vcpu_time_info.system_time on write to KVM system time MSR is a change of ABI. Probably only 2.6.16 based SLES 10 breaks due to its custom enhancements to kvmclock, but KVM never declared the MSR only for one-shot initialization. (Doc says that only one write is needed.) This reverts commit b7e60c5aedd2b63f16ef06fde4f81ca032211bc5. And adds a note to the definition of PVCLOCK_COUNTS_FROM_ZERO. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Acked-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-09-27Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: "Another pile of fixes for perf: - Plug overflows and races in the core code - Sanitize the flow of the perf syscall so we error out before handling the more complex and hard to undo setups - Improve and fix Broadwell and Skylake hardware support - Revert a fix which broke what it tried to fix in perf tools - A couple of smaller fixes in various places of perf tools" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf tools: Fix copying of /proc/kcore perf intel-pt: Remove no_force_psb from documentation perf probe: Use existing routine to look for a kernel module by dso->short_name perf/x86: Change test_aperfmperf() and test_intel() to static tools lib traceevent: Fix string handling in heterogeneous arch environments perf record: Avoid infinite loop at buildid processing with no samples perf: Fix races in computing the header sizes perf: Fix u16 overflows perf: Restructure perf syscall point of no return perf/x86/intel: Fix Skylake FRONTEND MSR extrareg mask perf/x86/intel/pebs: Add PEBS frontend profiling for Skylake perf/x86/intel: Make the CYCLE_ACTIVITY.* constraint on Broadwell more specific perf tools: Bool functions shouldn't return -1 tools build: Add test for presence of __get_cpuid() gcc builtin tools build: Add test for presence of numa_num_possible_cpus() in libnuma Revert "perf symbols: Fix mismatched declarations for elf_getphdrnum" perf stat: Fix per-pkg event reporting bug
2015-09-27Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "Two bugfixes from Andy addressing at least some of the subtle NMI related wreckage which has been reported by Sasha Levin" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/nmi/64: Fix a paravirt stack-clobbering bug in the NMI code x86/paravirt: Replace the paravirt nop with a bona fide empty function
2015-09-25Merge tag 'pci-v4.3-fixes-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fixes from Bjorn Helgaas: "These are fixes for things we merged for v4.3 (VPD, MSI, and bridge window management), and a new Renesas R8A7794 SoC device ID. Details: Resource management: - Revert pci_read_bridge_bases() unification (Bjorn Helgaas) - Clear IORESOURCE_UNSET when clipping a bridge window (Bjorn Helgaas) MSI: - Fix MSI IRQ domains for VFs on virtual buses (Alex Williamson) Renesas R-Car host bridge driver: - Add R8A7794 support (Sergei Shtylyov) Miscellaneous: - Fix devfn for VPD access through function 0 (Alex Williamson) - Use function 0 VPD only for identical functions (Alex Williamson)" * tag 'pci-v4.3-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI: rcar: Add R8A7794 support PCI: Use function 0 VPD for identical functions, regular VPD for others PCI: Fix devfn for VPD access through function 0 PCI/MSI: Fix MSI IRQ domains for VFs on virtual buses PCI: Clear IORESOURCE_UNSET when clipping a bridge window PCI: Revert "PCI: Call pci_read_bridge_bases() from core instead of arch code"
2015-09-25Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Paolo Bonzini: "AMD fixes for bugs introduced in the 4.2 merge window, and a few PPC bug fixes too" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: disable halt_poll_ns as default for s390x KVM: x86: fix off-by-one in reserved bits check KVM: x86: use correct page table format to check nested page table reserved bits KVM: svm: do not call kvm_set_cr0 from init_vmcb KVM: x86: trap AMD MSRs for the TSeg base and mask KVM: PPC: Book3S: Take the kvm->srcu lock in kvmppc_h_logical_ci_load/store() KVM: PPC: Book3S HV: Pass the correct trap argument to kvmhv_commence_exit KVM: PPC: Book3S HV: Fix handling of interrupted VCPUs kvm: svm: reset mmu on VCPU reset
2015-09-25KVM: disable halt_poll_ns as default for s390xDavid Hildenbrand
We observed some performance degradation on s390x with dynamic halt polling. Until we can provide a proper fix, let's enable halt_poll_ns as default only for supported architectures. Architectures are now free to set their own halt_poll_ns default value. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-09-25KVM: x86: fix off-by-one in reserved bits checkPaolo Bonzini
29ecd6601904 ("KVM: x86: avoid uninitialized variable warning", 2015-09-06) introduced a not-so-subtle problem, which probably escaped review because it was not part of the patch context. Before the patch, leaf was always equal to iterator.level. After, it is equal to iterator.level - 1 in the call to is_shadow_zero_bits_set, and when is_shadow_zero_bits_set does another "-1" the check on reserved bits becomes incorrect. Using "iterator.level" in the call fixes this call trace: WARNING: CPU: 2 PID: 17000 at arch/x86/kvm/mmu.c:3385 handle_mmio_page_fault.part.93+0x1a/0x20 [kvm]() Modules linked in: tun sha256_ssse3 sha256_generic drbg binfmt_misc ipv6 vfat fat fuse dm_crypt dm_mod kvm_amd kvm crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd fam15h_power amd64_edac_mod k10temp edac_core amdkfd amd_iommu_v2 radeon acpi_cpufreq [...] Call Trace: dump_stack+0x4e/0x84 warn_slowpath_common+0x95/0xe0 warn_slowpath_null+0x1a/0x20 handle_mmio_page_fault.part.93+0x1a/0x20 [kvm] tdp_page_fault+0x231/0x290 [kvm] ? emulator_pio_in_out+0x6e/0xf0 [kvm] kvm_mmu_page_fault+0x36/0x240 [kvm] ? svm_set_cr0+0x95/0xc0 [kvm_amd] pf_interception+0xde/0x1d0 [kvm_amd] handle_exit+0x181/0xa70 [kvm_amd] ? kvm_arch_vcpu_ioctl_run+0x68b/0x1730 [kvm] kvm_arch_vcpu_ioctl_run+0x6f6/0x1730 [kvm] ? kvm_arch_vcpu_ioctl_run+0x68b/0x1730 [kvm] ? preempt_count_sub+0x9b/0xf0 ? mutex_lock_killable_nested+0x26f/0x490 ? preempt_count_sub+0x9b/0xf0 kvm_vcpu_ioctl+0x358/0x710 [kvm] ? __fget+0x5/0x210 ? __fget+0x101/0x210 do_vfs_ioctl+0x2f4/0x560 ? __fget_light+0x29/0x90 SyS_ioctl+0x4c/0x90 entry_SYSCALL_64_fastpath+0x16/0x73 ---[ end trace 37901c8686d84de6 ]--- Reported-by: Borislav Petkov <bp@alien8.de> Tested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-09-25KVM: x86: use correct page table format to check nested page table reserved bitsPaolo Bonzini
Intel CPUID on AMD host or vice versa is a weird case, but it can happen. Handle it by checking the host CPU vendor instead of the guest's in reset_tdp_shadow_zero_bits_mask. For speed, the check uses the fact that Intel EPT has an X (executable) bit while AMD NPT has NX. Reported-by: Borislav Petkov <bp@alien8.de> Tested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-09-25KVM: svm: do not call kvm_set_cr0 from init_vmcbPaolo Bonzini
kvm_set_cr0 may want to call kvm_zap_gfn_range and thus access the memslots array (SRCU protected). Using a mini SRCU critical section is ugly, and adding it to kvm_arch_vcpu_create doesn't work because the VMX vcpu_create callback calls synchronize_srcu. Fixes this lockdep splat: =============================== [ INFO: suspicious RCU usage. ] 4.3.0-rc1+ #1 Not tainted ------------------------------- include/linux/kvm_host.h:488 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 1 lock held by qemu-system-i38/17000: #0: (&(&kvm->mmu_lock)->rlock){+.+...}, at: kvm_zap_gfn_range+0x24/0x1a0 [kvm] [...] Call Trace: dump_stack+0x4e/0x84 lockdep_rcu_suspicious+0xfd/0x130 kvm_zap_gfn_range+0x188/0x1a0 [kvm] kvm_set_cr0+0xde/0x1e0 [kvm] init_vmcb+0x760/0xad0 [kvm_amd] svm_create_vcpu+0x197/0x250 [kvm_amd] kvm_arch_vcpu_create+0x47/0x70 [kvm] kvm_vm_ioctl+0x302/0x7e0 [kvm] ? __lock_is_held+0x51/0x70 ? __fget+0x101/0x210 do_vfs_ioctl+0x2f4/0x560 ? __fget_light+0x29/0x90 SyS_ioctl+0x4c/0x90 entry_SYSCALL_64_fastpath+0x16/0x73 Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-09-25perf/x86: Change test_aperfmperf() and test_intel() to staticGeliang Tang
Fixes the following sparse warnings: arch/x86/kernel/cpu/perf_event_msr.c:13:6: warning: symbol 'test_aperfmperf' was not declared. Should it be static? arch/x86/kernel/cpu/perf_event_msr.c:18:6: warning: symbol 'test_intel' was not declared. Should it be static? Signed-off-by: Geliang Tang <geliangtang@163.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/4588e8ab09638458f2451af572827108be3b4a36.1443123796.git.geliangtang@163.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-23x86/cpufeatures: Correct spelling of the HWP_NOTIFY flagKristen Carlson Accardi
Because noitification just isn't right. Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Cc: rjw@rjwysocki.net Link: http://lkml.kernel.org/r/1442944296-11737-1-git-send-email-kristen@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-22x86, efi, kasan: #undef memset/memcpy/memmove per archAndrey Ryabinin
In not-instrumented code KASAN replaces instrumented memset/memcpy/memmove with not-instrumented analogues __memset/__memcpy/__memove. However, on x86 the EFI stub is not linked with the kernel. It uses not-instrumented mem*() functions from arch/x86/boot/compressed/string.c So we don't replace them with __mem*() variants in EFI stub. On ARM64 the EFI stub is linked with the kernel, so we should replace mem*() functions with __mem*(), because the EFI stub runs before KASAN sets up early shadow. So let's move these #undef mem* into arch's asm/efi.h which is also included by the EFI stub. Also, this will fix the warning in 32-bit build reported by kbuild test robot: efi-stub-helper.c:599:2: warning: implicit declaration of function 'memcpy' [akpm@linux-foundation.org: use 80 cols in comment] Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Reported-by: Fengguang Wu <fengguang.wu@gmail.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Matt Fleming <matt.fleming@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-22x86/nmi/64: Fix a paravirt stack-clobbering bug in the NMI codeAndy Lutomirski
The NMI entry code that switches to the normal kernel stack needs to be very careful not to clobber any extra stack slots on the NMI stack. The code is fine under the assumption that SWAPGS is just a normal instruction, but that assumption isn't really true. Use SWAPGS_UNSAFE_STACK instead. This is part of a fix for some random crashes that Sasha saw. Fixes: 9b6e6a8334d5 ("x86/nmi/64: Switch stacks on userspace NMI entry") Reported-and-tested-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/974bc40edffdb5c2950a5c4977f821a446b76178.1442791737.git.luto@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-09-22x86/paravirt: Replace the paravirt nop with a bona fide empty functionAndy Lutomirski
PARAVIRT_ADJUST_EXCEPTION_FRAME generates this code (using nmi as an example, trimmed for readability): ff 15 00 00 00 00 callq *0x0(%rip) # 2796 <nmi+0x6> 2792: R_X86_64_PC32 pv_irq_ops+0x2c That's a call through a function pointer to regular C function that does nothing on native boots, but that function isn't protected against kprobes, isn't marked notrace, and is certainly not guaranteed to preserve any registers if the compiler is feeling perverse. This is bad news for a CLBR_NONE operation. Of course, if everything works correctly, once paravirt ops are patched, it gets nopped out, but what if we hit this code before paravirt ops are patched in? This can potentially cause breakage that is very difficult to debug. A more subtle failure is possible here, too: if _paravirt_nop uses the stack at all (even just to push RBP), it will overwrite the "NMI executing" variable if it's called in the NMI prologue. The Xen case, perhaps surprisingly, is fine, because it's already written in asm. Fix all of the cases that default to paravirt_nop (including adjust_exception_frame) with a big hammer: replace paravirt_nop with an asm function that is just a ret instruction. The Xen case may have other problems, so document them. This is part of a fix for some random crashes that Sasha saw. Reported-and-tested-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/8f5d2ba295f9d73751c33d97fda03e0495d9ade0.1442791737.git.luto@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-09-21KVM: x86: trap AMD MSRs for the TSeg base and maskPaolo Bonzini
These have roughly the same purpose as the SMRR, which we do not need to implement in KVM. However, Linux accesses MSR_K8_TSEG_ADDR at boot, which causes problems when running a Xen dom0 under KVM. Just return 0, meaning that processor protection of SMRAM is not in effect. Reported-by: M A Young <m.a.young@durham.ac.uk> Cc: stable@vger.kernel.org Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-09-18Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Paolo Bonzini: "Mostly stable material, a lot of ARM fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits) sched: access local runqueue directly in single_task_running arm/arm64: KVM: Remove 'config KVM_ARM_MAX_VCPUS' arm64: KVM: Remove all traces of the ThumbEE registers arm: KVM: Disable virtual timer even if the guest is not using it arm64: KVM: Disable virtual timer even if the guest is not using it arm/arm64: KVM: vgic: Check for !irqchip_in_kernel() when mapping resources KVM: s390: Replace incorrect atomic_or with atomic_andnot arm: KVM: Fix incorrect device to IPA mapping arm64: KVM: Fix user access for debug registers KVM: vmx: fix VPID is 0000H in non-root operation KVM: add halt_attempted_poll to VCPU stats kvm: fix zero length mmio searching kvm: fix double free for fast mmio eventfd kvm: factor out core eventfd assign/deassign logic kvm: don't try to register to KVM_FAST_MMIO_BUS for non mmio eventfd KVM: make the declaration of functions within 80 characters KVM: arm64: add workaround for Cortex-A57 erratum #852523 KVM: fix polling for guest halt continued even if disable it arm/arm64: KVM: Fix PSCI affinity info return value for non valid cores arm64: KVM: set {v,}TCR_EL2 RES1 bits ...
2015-09-18Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "This is a rather large update post rc1 due to the final steps of cleanups and API changes which had to wait for the preparatory patches to hit your tree. - Regression fixes for ARM GIC irqchips - Regression fixes and lockdep anotations for renesas irq chips - The leftovers of the cleanup and preparatory patches which have been ignored by maintainers - Final conversions of the newly merged users of obsolete APIs - Final removal of obsolete APIs - Final removal of ARM artifacts which had been introduced during the conversion of ARM to the generic interrupt code. - Final split of the irq_data into chip specific and common data to reflect the needs of hierarchical irq domains. - Treewide removal of the first argument of interrupt flow handlers, i.e. the irq number, which is not used by the majority of handlers and simple to retrieve from the other argument the irq descriptor. - A few comment updates and build warning fixes" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits) arm64: Remove ununsed set_irq_flags ARM: Remove ununsed set_irq_flags sh: Kill off set_irq_flags usage irqchip: Kill off set_irq_flags usage gpu/drm: Kill off set_irq_flags usage genirq: Remove irq argument from irq flow handlers genirq: Move field 'msi_desc' from irq_data into irq_common_data genirq: Move field 'affinity' from irq_data into irq_common_data genirq: Move field 'handler_data' from irq_data into irq_common_data genirq: Move field 'node' from irq_data into irq_common_data irqchip/gic-v3: Use IRQD_FORWARDED_TO_VCPU flag irqchip/gic: Use IRQD_FORWARDED_TO_VCPU flag genirq: Provide IRQD_FORWARDED_TO_VCPU status flag genirq: Simplify irq_data_to_desc() genirq: Remove __irq_set_handler_locked() pinctrl/pistachio: Use irq_set_handler_locked gpio: vf610: Use irq_set_handler_locked powerpc/mpc8xx: Use irq_set_handler_locked() powerpc/ipic: Use irq_set_handler_locked() powerpc/cpm2: Use irq_set_handler_locked() ...
2015-09-18Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Thomas Gleixner: "A single regression fix for the x86 dma allocator which got wreckaged in the merge window" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/pci/dma: Fix gfp flags for coherent DMA memory allocation
2015-09-18kvm: svm: reset mmu on VCPU resetIgor Mammedov
When INIT/SIPI sequence is sent to VCPU which before that was in use by OS, VMRUN might fail with: KVM: entry failed, hardware error 0xffffffff EAX=00000000 EBX=00000000 ECX=00000000 EDX=000006d3 ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 EIP=00000000 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 00000000 0000ffff 00009300 CS =9a00 0009a000 0000ffff 00009a00 [...] CR0=60000010 CR2=b6f3e000 CR3=01942000 CR4=000007e0 [...] EFER=0000000000000000 with corresponding SVM error: KVM: FAILED VMRUN WITH VMCB: [...] cpl: 0 efer: 0000000000001000 cr0: 0000000080010010 cr2: 00007fd7fe85bf90 cr3: 0000000187d0c000 cr4: 0000000000000020 [...] What happens is that VCPU state right after offlinig: CR0: 0x80050033 EFER: 0xd01 CR4: 0x7e0 -> long mode with CR3 pointing to longmode page tables and when VCPU gets INIT/SIPI following transition happens CR0: 0 -> 0x60000010 EFER: 0x0 CR4: 0x7e0 -> paging disabled with stale CR3 However SVM under the hood puts VCPU in Paged Real Mode* which effectively translates CR0 0x60000010 -> 80010010 after svm_vcpu_reset() -> init_vmcb() -> kvm_set_cr0() -> svm_set_cr0() but from kvm_set_cr0() perspective CR0: 0 -> 0x60000010 only caching bits are changed and commit d81135a57aa6 ("KVM: x86: do not reset mmu if CR0.CD and CR0.NW are changed")' regressed svm_vcpu_reset() which relied on MMU being reset. As result VMRUN after svm_vcpu_reset() tries to run VCPU in Paged Real Mode with stale MMU context (longmode page tables), which causes some AMD CPUs** to bail out with VMEXIT_INVALID. Fix issue by unconditionally resetting MMU context at init_vmcb() time. * AMD64 Architecture Programmer’s Manual, Volume 2: System Programming, rev: 3.25 15.19 Paged Real Mode ** Opteron 1216 Signed-off-by: Igor Mammedov <imammedo@redhat.com> Fixes: d81135a57aa6 Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-09-18perf/x86/intel: Fix Skylake FRONTEND MSR extrareg maskAndi Kleen
Stephane pointed out that the extrareg mask was one bit too short. The bubble width field was truncated by one bit. Fix that here. Also add some extra comments on the reserved bits inside the event select code. Reported-by: Stephane Eranian <eranian@google.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1441835640-21347-3-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-18perf/x86/intel/pebs: Add PEBS frontend profiling for SkylakeAndi Kleen
Skylake has a new FRONTEND_LATENCY PEBS event to accurately profile frontend problems (like ITLB or decoding issues). The new event is configured through a separate MSR, which selects a range of sub events. Define the extra MSR as a extra reg and export support for it through sysfs. To avoid duplicating the existing tables use a new function to add new entries to existing tables. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1435707205-6676-4-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-18perf/x86/intel: Make the CYCLE_ACTIVITY.* constraint on Broadwell more specificAndi Kleen
The counter constraint for CYCLE_ACTIVITY.* on Broadwell covered all CYCLE_ACTIVITY.* sub events, and forced them on counter 2. But actually only one sub event (umask 8) needs to be on counter 2, all others do not have any constraint. Only force that subevent. This fixes groups with multiple CYCLE_ACTIVITY.* events, for example: % perf stat -x, -e '{cpu/event=0xa3,umask=0x6,cmask=6/,\ cpu/event=0xa2,umask=0x8/,\ cpu/event=0xa3,umask=0x4,cmask=4/,cpu/event=0xb1,umask=0x1,cmask=1/}' true 122150,,cpu/event=0xa3,umask=0x6,cmask=6/,846486,100.00 16483,,cpu/event=0xa2,umask=0x8/,846486,100.00 252280,,cpu/event=0xa3,umask=0x4,cmask=4/,846486,100.00 233604,,cpu/event=0xb1,umask=0x1,cmask=1/,846486,100.00 % Without this patch the third result would be <unsupported> Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1442267222-16464-1-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>