summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2016-03-22KVM: page_track: fix access to NULL slotPaolo Bonzini
This happens when doing the reboot test from virt-tests: [ 131.833653] BUG: unable to handle kernel NULL pointer dereference at (null) [ 131.842461] IP: [<ffffffffa0950087>] kvm_page_track_is_active+0x17/0x60 [kvm] [ 131.850500] PGD 0 [ 131.852763] Oops: 0000 [#1] SMP [ 132.007188] task: ffff880075fbc500 ti: ffff880850a3c000 task.ti: ffff880850a3c000 [ 132.138891] Call Trace: [ 132.141639] [<ffffffffa092bd11>] page_fault_handle_page_track+0x31/0x40 [kvm] [ 132.149732] [<ffffffffa093380f>] paging64_page_fault+0xff/0x910 [kvm] [ 132.172159] [<ffffffffa092c734>] kvm_mmu_page_fault+0x64/0x110 [kvm] [ 132.179372] [<ffffffffa06743c2>] handle_exception+0x1b2/0x430 [kvm_intel] [ 132.187072] [<ffffffffa067a301>] vmx_handle_exit+0x1e1/0xc50 [kvm_intel] ... Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com> Fixes: 3d0c27ad6ee465f174b09ee99fcaf189c57d567a Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22KVM: PPC: do not compile in vfio.o unconditionallyPaolo Bonzini
Build on 32-bit PPC fails with the following error: int kvm_vfio_ops_init(void) ^ In file included from arch/powerpc/kvm/../../../virt/kvm/vfio.c:21:0: arch/powerpc/kvm/../../../virt/kvm/vfio.h:8:90: note: previous definition of ‘kvm_vfio_ops_init’ was here arch/powerpc/kvm/../../../virt/kvm/vfio.c:292:6: error: redefinition of ‘kvm_vfio_ops_exit’ void kvm_vfio_ops_exit(void) ^ In file included from arch/powerpc/kvm/../../../virt/kvm/vfio.c:21:0: arch/powerpc/kvm/../../../virt/kvm/vfio.h:12:91: note: previous definition of ‘kvm_vfio_ops_exit’ was here scripts/Makefile.build:258: recipe for target arch/powerpc/kvm/../../../virt/kvm/vfio.o failed make[3]: *** [arch/powerpc/kvm/../../../virt/kvm/vfio.o] Error 1 Check whether CONFIG_KVM_VFIO is set before including vfio.o in the build. Reported-by: Pranith Kumar <bobby.prani@gmail.com> Tested-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22kvm, rt: change async pagefault code locking for PREEMPT_RTRik van Riel
The async pagefault wake code can run from the idle task in exception context, so everything here needs to be made non-preemptible. Conversion to a simple wait queue and raw spinlock does the trick. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22KVM/PPC: update the comment of memory barrier in the kvmppc_prepare_to_enter()Lan Tianyu
The barrier also orders the write to mode from any reads to the page tables done and so update the comment. Signed-off-by: Lan Tianyu <tianyu.lan@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22KVM/x86: update the comment of memory barrier in the vcpu_enter_guest()Lan Tianyu
The barrier also orders the write to mode from any reads to the page tables done and so update the comment. Signed-off-by: Lan Tianyu <tianyu.lan@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22KVM/x86: Call smp_wmb() before increasing tlbs_dirtyLan Tianyu
Update spte before increasing tlbs_dirty to make sure no tlb flush in lost after spte is zapped. This pairs with the barrier in the kvm_flush_remote_tlbs(). Signed-off-by: Lan Tianyu <tianyu.lan@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22KVM/x86: Replace smp_mb() with smp_store_mb/release() in the ↵Lan Tianyu
walk_shadow_page_lockless_begin/end() Signed-off-by: Lan Tianyu <tianyu.lan@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22KVM: Remove redundant smp_mb() in the kvm_mmu_commit_zap_page()Lan Tianyu
There is already a barrier inside of kvm_flush_remote_tlbs() which can help to make sure everyone sees our modifications to the page tables and see changes to vcpu->mode here. So remove the smp_mb in the kvm_mmu_commit_zap_page() and update the comment. Signed-off-by: Lan Tianyu <tianyu.lan@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22KVM, pkeys: expose CPUID/CR4 to guestHuaitong Han
X86_FEATURE_PKU is referred to as "PKU" in the hardware documentation: CPUID.7.0.ECX[3]:PKU. X86_FEATURE_OSPKE is software support for pkeys, enumerated with CPUID.7.0.ECX[4]:OSPKE, and it reflects the setting of CR4.PKE(bit 22). This patch disables CPUID:PKU without ept, because pkeys is not yet implemented for shadow paging. Signed-off-by: Huaitong Han <huaitong.han@intel.com> Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22KVM, pkeys: add pkeys support for permission_faultHuaitong Han
Protection keys define a new 4-bit protection key field (PKEY) in bits 62:59 of leaf entries of the page tables, the PKEY is an index to PKRU register(16 domains), every domain has 2 bits(write disable bit, access disable bit). Static logic has been produced in update_pkru_bitmask, dynamic logic need read pkey from page table entries, get pkru value, and deduce the correct result. [ Huaitong: Xiao helps to modify many sections. ] Signed-off-by: Huaitong Han <huaitong.han@intel.com> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22KVM, pkeys: introduce pkru_mask to cache conditionsHuaitong Han
PKEYS defines a new status bit in the PFEC. PFEC.PK (bit 5), if some conditions is true, the fault is considered as a PKU violation. pkru_mask indicates if we need to check PKRU.ADi and PKRU.WDi, and does cache some conditions for permission_fault. [ Huaitong: Xiao helps to modify many sections. ] Signed-off-by: Huaitong Han <huaitong.han@intel.com> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22KVM, pkeys: save/restore PKRU when guest/host switchesXiao Guangrong
Currently XSAVE state of host is not restored after VM-exit and PKRU is managed by XSAVE so the PKRU from guest is still controlling the memory access even if the CPU is running the code of host. This is not safe as KVM needs to access the memory of userspace (e,g QEMU) to do some emulation. So we save/restore PKRU when guest/host switches. Signed-off-by: Huaitong Han <huaitong.han@intel.com> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22x86: pkey: introduce write_pkru() for KVMXiao Guangrong
KVM will use it to switch pkru between guest and host. CC: Ingo Molnar <mingo@redhat.com> CC: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Huaitong Han <huaitong.han@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22KVM, pkeys: add pkeys support for xsave stateHuaitong Han
This patch adds pkeys support for xsave state. Signed-off-by: Huaitong Han <huaitong.han@intel.com> Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22KVM, pkeys: disable pkeys for guests in non-paging modeHuaitong Han
Pkeys is disabled if CPU is in non-paging mode in hardware. However KVM always uses paging mode to emulate guest non-paging, mode with TDP. To emulate this behavior, pkeys needs to be manually disabled when guest switches to non-paging mode. Signed-off-by: Huaitong Han <huaitong.han@intel.com> Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22KVM: x86: remove magic number with enum cpuid_leafsHuaitong Han
This patch removes magic number with enum cpuid_leafs. Signed-off-by: Huaitong Han <huaitong.han@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22KVM: MMU: return page fault error code from permission_faultPaolo Bonzini
This will help in the implementation of PKRU, where the PK bit of the page fault error code cannot be computed in advance (unlike I/D, R/W and U/S). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22KVM: PPC: Create a virtual-mode only TCE table handlersAlexey Kardashevskiy
Upcoming in-kernel VFIO acceleration needs different handling in real and virtual modes which makes it hard to support both modes in the same handler. This creates a copy of kvmppc_rm_h_stuff_tce and kvmppc_rm_h_put_tce in addition to the existing kvmppc_rm_h_put_tce_indirect. This also fixes linker breakage when only PR KVM was selected (leaving HV KVM off): the kvmppc_h_put_tce/kvmppc_h_stuff_tce functions would not compile at all and the linked would fail. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22KVM: VMX: fix nested vpid for old KVM guestsPaolo Bonzini
Old KVM guests invoke single-context invvpid without actually checking whether it is supported. This was fixed by commit 518c8ae ("KVM: VMX: Make sure single type invvpid is supported before issuing invvpid instruction", 2010-08-01) and the patch after, but pre-2.6.36 kernels lack it including RHEL 6. Reported-by: jmontleo@redhat.com Tested-by: jmontleo@redhat.com Cc: stable@vger.kernel.org Fixes: 99b83ac893b84ed1a62ad6d1f2b6cc32026b9e85 Reviewed-by: David Matlack <dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22KVM: VMX: avoid guest hang on invalid invvpid instructionPaolo Bonzini
A guest executing an invalid invvpid instruction would hang because the instruction pointer was not updated. Reported-by: jmontleo@redhat.com Tested-by: jmontleo@redhat.com Cc: stable@vger.kernel.org Fixes: 99b83ac893b84ed1a62ad6d1f2b6cc32026b9e85 Reviewed-by: David Matlack <dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22KVM: VMX: avoid guest hang on invalid invept instructionPaolo Bonzini
A guest executing an invalid invept instruction would hang because the instruction pointer was not updated. Cc: stable@vger.kernel.org Fixes: bfd0a56b90005f8c8a004baf407ad90045c2b11e Reviewed-by: David Matlack <dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22ARM: dts: at91: sama5d4 Xplained: don't disable hsmci regulatorLudovic Desroches
If enabling the hsmci regulator on card detection, the board can reboot on sd card insertion. Keeping the regulator always enabled fixes this issue. Signed-off-by: Ludovic Desroches <ludovic.desroches@atmel.com> Fixes: 8d545f32bd77 ("ARM: at91/dt: sama5d4 xplained: add regulators for v(q)mmc1 supplies") Cc: stable@vger.kernel.org #4.3 and later Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
2016-03-22ARM: dts: at91: sama5d3 Xplained: don't disable hsmci regulatorLudovic Desroches
If enabling the hsmci regulator on card detection, the board can reboot on sd card insertion. Keeping the regulator always enabled fixes this issue. Signed-off-by: Ludovic Desroches <ludovic.desroches@atmel.com> Fixes: 1b53e3416dd0 ("ARM: at91/dt: sama5d3 xplained: add fixed regulator for vmmc0") Cc: stable@vger.kernel.org #4.3 and later Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
2016-03-21Merge tag 'arm64-perf' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm[64] perf updates from Will Deacon: "I have another mixed bag of ARM-related perf patches here. It's about 25% CPU and 75% interconnect, but with drivers/bus/ languishing without an obvious maintainer or tree, Olof and I agreed to keep all of these PMU patches together. I suspect a whole load of code from drivers/bus/arm-* can be moved under drivers/perf/, so that's on the radar for the future. Summary: - Initial support for ARMv8.1 CPU PMUs - Support for the CPU PMU in Cavium ThunderX - CPU PMU support for systems running 32-bit Linux in secure mode - Support for the system PMU in ARM CCI-550 (Cache Coherent Interconnect)" * tag 'arm64-perf' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (26 commits) drivers/perf: arm_pmu: avoid NULL dereference when not using devicetree arm64: perf: Extend ARMV8_EVTYPE_MASK to include PMCR.LC arm-cci: remove unused variable arm-cci: don't return value from void function arm-cci: make private functions static arm-cci: CoreLink CCI-550 PMU driver arm-cci500: Rearrange PMU driver for code sharing with CCI-550 PMU arm-cci: CCI-500: Work around PMU counter writes arm-cci: Provide hook for writing to PMU counters arm-cci: Add helper to enable PMU without synchornising counters arm-cci: Add routines to save/restore all counters arm-cci: Get the status of a counter arm-cci: write_counter: Remove redundant check arm-cci: Delay PMU counter writes to pmu::pmu_enable arm-cci: Refactor CCI PMU enable/disable methods arm-cci: Group writes to counter arm-cci: fix handling cpumask_any_but return value arm-cci: simplify sysfs attr handling drivers/perf: arm_pmu: implement CPU_PM notifier arm64: dts: Add Cavium ThunderX specific PMU ...
2016-03-21Merge tag 'arc-4.6-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc Pull ARC architecture updates from Vineet Gupta: - Big Endian io accessors fix [Lada] - Spellos fixes [Adam] - Fix for DW GMAC breakage [Alexey] - Making DMA API 64-bit ready - Shutting up -Wmaybe-uninitialized noise for ARC - Other minor fixes here and there, comments update * tag 'arc-4.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: (21 commits) ARCv2: ioremap: Support dynamic peripheral address space ARC: dma: reintroduce platform specific dma<->phys ARC: dma: ioremap: use phys_addr_t consistenctly in code paths ARC: dma: pass_phys() not sg_virt() to cache ops ARC: dma: non-coherent pages need V-P mapping if in HIGHMEM ARC: dma: Use struct page based page allocator helpers ARC: build: Turn off -Wmaybe-uninitialized for ARC gcc 4.8 ARC: [plat-axs10x] add Ethernet PHY description in .dts arc: use of_platform_default_populate() to populate default bus ARC: thp: unbork !CONFIG_TRANSPARENT_HUGEPAGE build arc: [plat-nsimosci*] use ezchip network driver ARCv2: LLSC: software backoff is NOT needed starting HS2.1c ARC: mm: Use virt_to_pfn() for addr >> PAGE_SHIFT pattern ARC: [plat-nsim] document ranges ARC: build: Better way to detect ISA compatible toolchain ARCv2: Allow enabling PAE40 w/o HIGHMEM ARC: [BE] readl()/writel() to work in Big Endian CPU configuration ARC: [*defconfig] No need to specify CONFIG_CROSS_COMPILE ARC: [BE] Select correct CROSS_COMPILE prefix ARC: bitops: Remove non relevant comments ...
2016-03-21xen: audit usages of module.h ; remove unnecessary instancesPaul Gortmaker
Code that uses no modular facilities whatsoever should not be sourcing module.h at all, since that header drags in a bunch of other headers with it. Similarly, code that is not explicitly using modular facilities like module_init() but only is declaring module_param setup variables should be using moduleparam.h and not the larger module.h file for that. In making this change, we also uncover an implicit use of BUG() in inline fcns within arch/arm/include/asm/xen/hypercall.h so we explicitly source <linux/bug.h> for that file now. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2016-03-21Merge branches 'arm/rockchip', 'arm/exynos', 'arm/smmu', 'arm/mediatek', ↵Joerg Roedel
'arm/io-pgtable', 'arm/renesas' and 'core' into next
2016-03-21kvm: arm64: Disable compiler instrumentation for hypervisor codeCatalin Marinas
With the recent rewrite of the arm64 KVM hypervisor code in C, enabling certain options like KASAN would allow the compiler to generate memory accesses or function calls to addresses not mapped at EL2. This patch disables the compiler instrumentation on the arm64 hypervisor code for gcov-based profiling (GCOV_KERNEL), undefined behaviour sanity checker (UBSAN) and kernel address sanitizer (KASAN). Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoffer Dall <christoffer.dall@linaro.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: <stable@vger.kernel.org> # 4.5+ Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-03-21arm64: Split pr_notice("Virtual kernel memory layout...") into multiple ↵Catalin Marinas
pr_cont() The printk() implementation has a limit of LOG_LINE_MAX (== 1024 - 32) buffer per call which the arm64 mem_init() breaches when printing the virtual memory layout with CONFIG_KASAN enabled. The result is that the last line is no longer printed. This patch splits the call into a pr_notice() + additional pr_cont() calls. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com>
2016-03-21arm64: drop unused __local_flush_icache_all()Kefeng Wang
After commit 65da0a8e34a8 ("arm64: use non-global mappings for UEFI runtime regions"), nobody use __local_flush_icache_all() anymore, so drop it. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-03-21arm64: fix KASLR boot-time I-cache maintenanceMark Rutland
Commit f80fb3a3d50843a4 ("arm64: add support for kernel ASLR") missed a DSB necessary to complete I-cache maintenance in the primary boot path, and hence stale instructions may still be present in the I-cache and may be executed until the I-cache maintenance naturally completes. Since commit 8ec41987436d566f ("arm64: mm: ensure patched kernel text is fetched from PoU"), all CPUs invalidate their I-caches after their MMU is enabled. Prior a CPU's MMU having been enabled, arbitrary lines may have been fetched from the PoC into I-caches. We never patch text expected to be executed with the MMU off. Thus, it is unnecessary to perform broadcast I-cache maintenance in the primary boot path. This patch reduces the scope of the I-cache maintenance to the local CPU, and adds the missing DSB with similar scope, matching prior maintenance in the primary boot path. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Ard Biesehvuel <ard.biesheuvel@linaro.org> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-03-21arm64/kernel: fix incorrect EL0 check in inv_entry macroArd Biesheuvel
The implementation of macro inv_entry refers to its 'el' argument without the required leading backslash, which results in an undefined symbol 'el' to be passed into the kernel_entry macro rather than the index of the exception level as intended. This undefined symbol strangely enough does not result in build failures, although it is visible in vmlinux: $ nm -n vmlinux |head U el 0000000000000000 A _kernel_flags_le_hi32 0000000000000000 A _kernel_offset_le_hi32 0000000000000000 A _kernel_size_le_hi32 000000000000000a A _kernel_flags_le_lo32 ..... However, it does result in incorrect code being generated for invalid exceptions taken from EL0, since the argument check in kernel_entry assumes EL1 if its argument does not equal '0'. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-03-21perf/x86/intel/rapl: Add missing Broadwell modelsSrinivas Pandruvada
Added Broadwell-H and Broadwell-Server. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: bp@alien8.de Link: http://lkml.kernel.org/r/1458517938-25308-1-git-send-email-srinivas.pandruvada@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21perf/x86/intel/uncore: Remove ev_sel_ext bit support for PCUKan Liang
The ev_sel_ext in PCU_MSR_PMON_CTL is locked on some CPU models, so despite it being documented in the SDM, if we write 1 to that bit then we can get a #GP fault. Which #GP the perf fuzzer happily triggered in Peter Zijlstra's testing. Also, there are no public events which use that bit, so remove ev_sel_ext bit support for PCU. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Kan Liang <kan.liang@intel.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1458500301-3594-1-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21arm64: KVM: Turn kvm_ksym_ref into a NOP on VHEMarc Zyngier
When running with VHE, there is no need to translate kernel pointers to the EL2 memory space, since we're already there (and we have a much saner memory map to start with). Unfortunately, kvm_ksym_ref is getting in the way, and the first call into the "hypervisor" section is going to end up in fireworks, since we're now branching into nowhereland. Meh. A potential solution is to test if VHE is engaged or not, and only perform the translation in the negative case. With this in place, VHE is able to run again. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-03-21KVM: arm/arm64: disable preemption when calling smp_call_function_manyEric Auger
Preemption must be disabled when calling smp_call_function_many Reported-by: bartosz.wawrzyniak@tieto.com Signed-off-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-03-21perf/x86/amd/power: Add AMD accumulated power reporting mechanismHuang Rui
Introduce an AMD accumlated power reporting mechanism for the Family 15h, Model 60h processor that can be used to calculate the average power consumed by a processor during a measurement interval. The feature support is indicated by CPUID Fn8000_0007_EDX[12]. This feature will be implemented both in hwmon and perf. The current design provides one event to report per package/processor power consumption by counting each compute unit power value. Here the gory details of how the computation is done: * Tsample: compute unit power accumulator sample period * Tref: the PTSC counter period (PTSC: performance timestamp counter) * N: the ratio of compute unit power accumulator sample period to the PTSC period * Jmax: max compute unit accumulated power which is indicated by MSR_C001007b[MaxCpuSwPwrAcc] * Jx/Jy: compute unit accumulated power which is indicated by MSR_C001007a[CpuSwPwrAcc] * Tx/Ty: the value of performance timestamp counter which is indicated by CU_PTSC MSR_C0010280[PTSC] * PwrCPUave: CPU average power i. Determine the ratio of Tsample to Tref by executing CPUID Fn8000_0007. N = value of CPUID Fn8000_0007_ECX[CpuPwrSampleTimeRatio[15:0]]. ii. Read the full range of the cumulative energy value from the new MSR MaxCpuSwPwrAcc. Jmax = value returned. iii. At time x, software reads CpuSwPwrAcc and samples the PTSC. Jx = value read from CpuSwPwrAcc and Tx = value read from PTSC. iv. At time y, software reads CpuSwPwrAcc and samples the PTSC. Jy = value read from CpuSwPwrAcc and Ty = value read from PTSC. v. Calculate the average power consumption for a compute unit over time period (y-x). Unit of result is uWatt: if (Jy < Jx) // Rollover has occurred Jdelta = (Jy + Jmax) - Jx else Jdelta = Jy - Jx PwrCPUave = N * Jdelta * 1000 / (Ty - Tx) Simple example: root@hr-zp:/home/ray/tip# ./tools/perf/perf stat -a -e 'power/power-pkg/' make -j4 CHK include/config/kernel.release CHK include/generated/uapi/linux/version.h CHK include/generated/utsrelease.h CHK include/generated/timeconst.h CHK include/generated/bounds.h CHK include/generated/asm-offsets.h CALL scripts/checksyscalls.sh CHK include/generated/compile.h SKIPPED include/generated/compile.h Building modules, stage 2. Kernel: arch/x86/boot/bzImage is ready (#40) MODPOST 4225 modules Performance counter stats for 'system wide': 183.44 mWatts power/power-pkg/ 341.837270111 seconds time elapsed root@hr-zp:/home/ray/tip# ./tools/perf/perf stat -a -e 'power/power-pkg/' sleep 10 Performance counter stats for 'system wide': 0.18 mWatts power/power-pkg/ 10.012551815 seconds time elapsed Suggested-by: Peter Zijlstra <peterz@infradead.org> Suggested-by: Ingo Molnar <mingo@kernel.org> Suggested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Robert Richter <rric@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: jacob.w.shin@gmail.com Link: http://lkml.kernel.org/r/1457502306-2559-1-git-send-email-ray.huang@amd.com [ Fixed the modular build. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21x86/cpufeature, perf/x86: Add AMD Accumulated Power Mechanism feature flagHuang Rui
AMD CPU family 15h model 0x60 introduces a mechanism for measuring accumulated power. It is used to report the processor power consumption and support for it is indicated by CPUID Fn8000_0007_EDX[12]. Signed-off-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Aaron Lu <aaron.lu@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andreas Herrmann <herrmann.der.user@googlemail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hector Marco-Gisbert <hecmargi@upv.es> Cc: Jacob Shin <jacob.w.shin@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Kristen Carlson Accardi <kristen@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <rric@kernel.org> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Wan Zongshun <Vincent.Wan@amd.com> Cc: spg_linux_kernel@amd.com Link: http://lkml.kernel.org/r/1452739808-11871-4-git-send-email-ray.huang@amd.com [ Resolved conflict and moved the synthetic CPUID slot to 19. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21perf/x86/amd: Add support for new IOMMU performance eventsSuravee Suthikulpanit
This patch adds new IOMMU performance event based on the information in table 74 of the AMD I/O Virtualization Technology (IOMMU) Specification (Document Id: 4882, Rev 2.62, Feb 2015) Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Joerg Roedel <jroedel@suse.de> Acked-by: Joerg Roedel <jroedel@suse.de> Cc: <acme@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://support.amd.com/TechDocs/48882_IOMMU.pdf Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21perf/x86/amd: Move nodes_per_socket into bsp_init_amd()Huang Rui
nodes_per_socket is static and it needn't be initialized many times during every CPU core init. So move its initialization into bsp_init_amd(). Signed-off-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Aaron Lu <aaron.lu@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andreas Herrmann <herrmann.der.user@googlemail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hector Marco-Gisbert <hecmargi@upv.es> Cc: Jacob Shin <jacob.w.shin@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <rric@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: spg_linux_kernel@amd.com Link: http://lkml.kernel.org/r/1452739808-11871-2-git-send-email-ray.huang@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21perf/x86/cqm: Factor out some common codePeter Zijlstra
Having the same code twice (and once quite ugly) is fragile. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21perf/x86/mbm: Add support for MBM counter overflow handlingVikas Shivappa
This patch adds a per package timer which periodically updates the memory bandwidth counters for the events that are currently active. Current patch has a periodic timer every 1s since the SDM guarantees that the counter will not overflow in 1s but this time can be definitely improved by calibrating on the system. The overflow is really a function of the max memory b/w that the socket can support, max counter value and scaling factor. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Tony Luck <tony.luck@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: fenghua.yu@intel.com Cc: h.peter.anvin@intel.com Cc: ravi.v.shankar@intel.com Cc: vikas.shivappa@intel.com Link: http://lkml.kernel.org/r/013b756c5006b1c4ca411f3ecf43ed52f19fbf87.1457723885.git.tony.luck@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21perf/x86/mbm: Implement RMID recyclingVikas Shivappa
RMID could be allocated or deallocated as part of RMID recycling. When an RMID is allocated for MBM event, the MBM counter needs to be initialized because next time we read the counter we need the previous value to account for total bytes that went to the memory controller. Similarly, when RMID is deallocated we need to update the ->count variable. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Tony Luck <tony.luck@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: fenghua.yu@intel.com Cc: h.peter.anvin@intel.com Cc: ravi.v.shankar@intel.com Cc: vikas.shivappa@intel.com Link: http://lkml.kernel.org/r/1457652732-4499-6-git-send-email-vikas.shivappa@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21perf/x86/mbm: Add memory bandwidth monitoring event managementTony Luck
Includes all the core infrastructure to measure the total_bytes and bandwidth. We have per socket counters for both total system wide L3 external bytes and local socket memory-controller bytes. The OS does MSR writes to MSR_IA32_QM_EVTSEL and MSR_IA32_QM_CTR to read the counters and uses the IA32_PQR_ASSOC_MSR to associate the RMID with the task. The tasks have a common RMID for CQM (cache quality of service monitoring) and MBM. Hence most of the scheduling code is reused from CQM. Signed-off-by: Tony Luck <tony.luck@intel.com> [ Restructured rmid_read to not have an obvious hole, removed MBM_CNTR_MAX as its unused. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: fenghua.yu@intel.com Cc: h.peter.anvin@intel.com Cc: ravi.v.shankar@intel.com Cc: vikas.shivappa@intel.com Link: http://lkml.kernel.org/r/abd7aac9a18d93b95b985b931cf258df0164746d.1457723885.git.tony.luck@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21perf/x86/mbm: Add Intel Memory B/W Monitoring enumeration and initVikas Shivappa
The MBM init patch enumerates the Intel MBM (Memory b/w monitoring) and initializes the perf events and datastructures for monitoring the memory b/w. Its based on original patch series by Tony Luck and Kanaka Juvva. Memory bandwidth monitoring (MBM) provides OS/VMM a way to monitor bandwidth from one level of cache to another. The current patches support L3 external bandwidth monitoring. It supports both 'local bandwidth' and 'total bandwidth' monitoring for the socket. Local bandwidth measures the amount of data sent through the memory controller on the socket and total b/w measures the total system bandwidth. Extending the cache quality of service monitoring (CQM) we add two more events to the perf infrastructure: intel_cqm_llc/local_bytes - bytes sent through local socket memory controller intel_cqm_llc/total_bytes - total L3 external bytes sent The tasks are associated with a Resouce Monitoring ID (RMID) just like in CQM and OS uses a MSR write to indicate the RMID of the task during scheduling. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Tony Luck <tony.luck@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: fenghua.yu@intel.com Cc: h.peter.anvin@intel.com Cc: ravi.v.shankar@intel.com Cc: vikas.shivappa@intel.com Link: http://lkml.kernel.org/r/1457652732-4499-4-git-send-email-vikas.shivappa@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21perf/x86/cqm: Fix CQM memory leak and notifier leakVikas Shivappa
Fixes the hotcpu notifier leak and other global variable memory leaks during CQM (cache quality of service monitoring) initialization. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Tony Luck <tony.luck@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: fenghua.yu@intel.com Cc: h.peter.anvin@intel.com Cc: ravi.v.shankar@intel.com Cc: vikas.shivappa@intel.com Link: http://lkml.kernel.org/r/1457652732-4499-3-git-send-email-vikas.shivappa@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21perf/x86/cqm: Fix CQM handling of grouping events into a cache_groupVikas Shivappa
Currently CQM (cache quality of service monitoring) is grouping all events belonging to same PID to use one RMID. However its not counting all of these different events. Hence we end up with a count of zero for all events other than the group leader. The patch tries to address the issue by keeping a flag in the perf_event.hw which has other CQM related fields. The field is updated at event creation and during grouping. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> [peterz: Changed hw_perf_event::is_group_event to an int] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Tony Luck <tony.luck@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: fenghua.yu@intel.com Cc: h.peter.anvin@intel.com Cc: ravi.v.shankar@intel.com Cc: vikas.shivappa@intel.com Link: http://lkml.kernel.org/r/1457652732-4499-2-git-send-email-vikas.shivappa@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21perf/x86/BTS: Fix RCU usagePeter Zijlstra
This splat reminds us: [ 8166.045595] [ INFO: suspicious RCU usage. ] [ 8166.168972] [<ffffffff81127837>] lockdep_rcu_suspicious+0xe7/0x120 [ 8166.175966] [<ffffffff811e0bae>] perf_callchain+0x23e/0x250 [ 8166.182280] [<ffffffff811dda3d>] perf_prepare_sample+0x27d/0x350 [ 8166.189082] [<ffffffff8100f503>] intel_pmu_drain_bts_buffer+0x133/0x200 ... that as the core code does, one should hold rcu_read_lock() over that entire BTS event-output generation sequence as well. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21perf/x86/ibs: Add IBS interrupt to the dynamic throttlePeter Zijlstra
Interrupt throttling is normally only done against sysctl_perf_event_sample_rate. This means that if that number is too high (for whatever reason) you can lock up your machine. We have, however, a dynamic throttling scheme too, but for that to work, we need to add a callback to the interrupt handler, IBS did not have this, so add it. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21perf/x86/ibs: Fix race with IBS_STARTING statePeter Zijlstra
While tracing the IBS bits I saw the NMI hitting between clearing IBS_STARTING and the actual MSR writes to disable the counter. Since IBS_STARTING was cleared, the handler assumed these were spurious NMIs and because STOPPING wasn't set yet either, insta-triggered an "Unknown NMI". Cure this by clearing IBS_STARTING after disabling the hardware. Tested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>