summaryrefslogtreecommitdiff
path: root/arch/x86/kernel/cpu
AgeCommit message (Collapse)Author
2013-07-04Merge branch 'akpm' (updates from Andrew Morton)Linus Torvalds
Merge first patch-bomb from Andrew Morton: - various misc bits - I'm been patchmonkeying ocfs2 for a while, as Joel and Mark have been distracted. There has been quite a bit of activity. - About half the MM queue - Some backlight bits - Various lib/ updates - checkpatch updates - zillions more little rtc patches - ptrace - signals - exec - procfs - rapidio - nbd - aoe - pps - memstick - tools/testing/selftests updates * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (445 commits) tools/testing/selftests: don't assume the x bit is set on scripts selftests: add .gitignore for kcmp selftests: fix clean target in kcmp Makefile selftests: add .gitignore for vm selftests: add hugetlbfstest self-test: fix make clean selftests: exit 1 on failure kernel/resource.c: remove the unneeded assignment in function __find_resource aio: fix wrong comment in aio_complete() drivers/w1/slaves/w1_ds2408.c: add magic sequence to disable P0 test mode drivers/memstick/host/r592.c: convert to module_pci_driver drivers/memstick/host/jmb38x_ms: convert to module_pci_driver pps-gpio: add device-tree binding and support drivers/pps/clients/pps-gpio.c: convert to module_platform_driver drivers/pps/clients/pps-gpio.c: convert to devm_* helpers drivers/parport/share.c: use kzalloc Documentation/accounting/getdelays.c: avoid strncpy in accounting tool aoe: update internal version number to v83 aoe: update copyright date aoe: perform I/O completions in parallel ...
2013-07-03mm/x86: prepare for removing num_physpages and simplify mem_init()Jiang Liu
Prepare for removing num_physpages and simplify mem_init(). Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Jianguo Wu <wujianguo@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03Merge tag 'please-pull-mce-therm' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux Pull thermal power-limit update from Tony Luck: "Thermal limit warnings are too scary and cause unnecessary concern" * tag 'please-pull-mce-therm' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux: x86 thermal: Disable power limit notification interrupt by default x86 thermal: Delete power-limit-notification console messages
2013-07-02Merge branch 'x86-tracing-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 tracing updates from Ingo Molnar: "This tree adds IRQ vector tracepoints that are named after the handler and which output the vector #, based on a zero-overhead approach that relies on changing the IDT entries, by Seiji Aguchi. The new tracepoints look like this: # perf list | grep -i irq_vector irq_vectors:local_timer_entry [Tracepoint event] irq_vectors:local_timer_exit [Tracepoint event] irq_vectors:reschedule_entry [Tracepoint event] irq_vectors:reschedule_exit [Tracepoint event] irq_vectors:spurious_apic_entry [Tracepoint event] irq_vectors:spurious_apic_exit [Tracepoint event] irq_vectors:error_apic_entry [Tracepoint event] irq_vectors:error_apic_exit [Tracepoint event] [...]" * 'x86-tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/tracing: Add config option checking to the definitions of mce handlers trace,x86: Do not call local_irq_save() in load_current_idt() trace,x86: Move creation of irq tracepoints from apic.c to irq.c x86, trace: Add irq vector tracepoints x86: Rename variables for debugging x86, trace: Introduce entering/exiting_irq() tracing: Add DEFINE_EVENT_FN() macro
2013-07-02Merge branch 'x86-ras-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RAS update from Ingo Molnar: "The changes in this tree are: - ACPI APEI (ACPI Platform Error Interface) improvements, by Chen Gong - misc MCE fixes/cleanups" * 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Update MCE severity condition check mce: acpi/apei: Add comments to clarify usage of the various bitfields in the MCA subsystem ACPI/APEI: Update einj documentation for param1/param2 ACPI/APEI: Add parameter check before error injection ACPI, APEI, EINJ: Fix error return code in einj_init() x86, mce: Fix "braodcast" typo
2013-07-02Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm changes from Ingo Molnar: "Misc improvements: - Fix /proc/mtrr reporting - Fix ioremap printout - Remove the unused pvclock fixmap entry on 32-bit - misc cleanups" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ioremap: Correct function name output x86: Fix /proc/mtrr with base/size more than 44bits ix86: Don't waste fixmap entries x86/mm: Drop unneeded include <asm/*pgtable, page*_types.h> x86_64: Correct phys_addr in cleanup_highmap comment
2013-07-02Merge branch 'x86-fpu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 FPU changes from Ingo Molnar: "There are two bigger changes in this tree: - Add an [early-use-]safe static_cpu_has() variant and other robustness improvements, including the new X86_DEBUG_STATIC_CPU_HAS configurable debugging facility, motivated by recent obscure FPU code bugs, by Borislav Petkov - Reimplement FPU detection code in C and drop the old asm code, by Peter Anvin." * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, fpu: Use static_cpu_has_safe before alternatives x86: Add a static_cpu_has_safe variant x86: Sanity-check static_cpu_has usage x86, cpu: Add a synthetic, always true, cpu feature x86: Get rid of ->hard_math and all the FPU asm fu
2013-07-02Merge branch 'x86-cpu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu updates from Ingo Molnar: "Two changes: - Extend 32-bit double fault debugging aid to 64-bit - Fix a build warning" * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/intel/cacheinfo: Shut up last long-standing warning x86: Extend #DF debugging aid to 64-bit
2013-07-02Merge branch 'x86-asm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull asm/x86 changes from Ingo Molnar: "Misc changes, with a bigger processor-flags cleanup/reorganization by Peter Anvin" * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, asm, cleanup: Replace open-coded control register values with symbolic x86, processor-flags: Fix the datatypes and add bit number defines x86: Rename X86_CR4_RDWRGSFS to X86_CR4_FSGSBASE x86, flags: Rename X86_EFLAGS_BIT1 to X86_EFLAGS_FIXED linux/const.h: Add _BITUL() and _BITULL() x86/vdso: Convert use of typedef ctl_table to struct ctl_table x86: __force_order doesn't need to be an actual variable
2013-07-02Merge branch 'perf-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "Kernel improvements: - watchdog driver improvements by Li Zefan - Power7 CPI stack events related improvements by Sukadev Bhattiprolu - event multiplexing via hrtimers and other improvements by Stephane Eranian - kernel stack use optimization by Andrew Hunter - AMD IOMMU uncore PMU support by Suravee Suthikulpanit - NMI handling rate-limits by Dave Hansen - various hw_breakpoint fixes by Oleg Nesterov - hw_breakpoint overflow period sampling and related signal handling fixes by Jiri Olsa - Intel Haswell PMU support by Andi Kleen Tooling improvements: - Reset SIGTERM handler in workload child process, fix from David Ahern. - Makefile reorganization, prep work for Kconfig patches, from Jiri Olsa. - Add automated make test suite, from Jiri Olsa. - Add --percent-limit option to 'top' and 'report', from Namhyung Kim. - Sorting improvements, from Namhyung Kim. - Expand definition of sysfs format attribute, from Michael Ellerman. Tooling fixes: - 'perf tests' fixes from Jiri Olsa. - Make Power7 CPI stack events available in sysfs, from Sukadev Bhattiprolu. - Handle death by SIGTERM in 'perf record', fix from David Ahern. - Fix printing of perf_event_paranoid message, from David Ahern. - Handle realloc failures in 'perf kvm', from David Ahern. - Fix divide by 0 in variance, from David Ahern. - Save parent pid in thread struct, from David Ahern. - Handle JITed code in shared memory, from Andi Kleen. - Fixes for 'perf diff', from Jiri Olsa. - Remove some unused struct members, from Jiri Olsa. - Add missing liblk.a dependency for python/perf.so, fix from Jiri Olsa. - Respect CROSS_COMPILE in liblk.a, from Rabin Vincent. - No need to do locking when adding hists in perf report, only 'top' needs that, from Namhyung Kim. - Fix alignment of symbol column in in the hists browser (top, report) when -v is given, from NAmhyung Kim. - Fix 'perf top' -E option behavior, from Namhyung Kim. - Fix bug in isupper() and islower(), from Sukadev Bhattiprolu. - Fix compile errors in bp_signal 'perf test', from Sukadev Bhattiprolu. ... and more things" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (102 commits) perf/x86: Disable PEBS-LL in intel_pmu_pebs_disable() perf/x86: Fix shared register mutual exclusion enforcement perf/x86/intel: Support full width counting x86: Add NMI duration tracepoints perf: Drop sample rate when sampling is too slow x86: Warn when NMI handlers take large amounts of time hw_breakpoint: Introduce "struct bp_cpuinfo" hw_breakpoint: Simplify *register_wide_hw_breakpoint() hw_breakpoint: Introduce cpumask_of_bp() hw_breakpoint: Simplify the "weight" usage in toggle_bp_slot() paths hw_breakpoint: Simplify list/idx mess in toggle_bp_slot() paths perf/x86/intel: Add mem-loads/stores support for Haswell perf/x86/intel: Support Haswell/v4 LBR format perf/x86/intel: Move NMI clearing to end of PMI handler perf/x86/intel: Add Haswell PEBS support perf/x86/intel: Add simple Haswell PMU support perf/x86/intel: Add Haswell PEBS record support perf/x86/intel: Fix sparse warning perf/x86/amd: AMD IOMMU Performance Counter PERF uncore PMU implementation perf/x86/amd: Add IOMMU Performance Counter resource management ...
2013-06-28Merge tag 'please-pull-mce' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras Pull MCE cleanup from Tony Luck: "Changes to simplify the SDM means that we can also simplify the code for SRAR (software recoverable action required) errors." Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-27x86/mce: Update MCE severity condition checkChen Gong
Update some SRAR severity conditions check to make it clearer, according to latest Intel SDM Vol 3(June 2013), table 15-20. Signed-off-by: Chen Gong <gong.chen@linux.intel.com> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-06-27Merge tag 'v3.10-rc7' into drm-nextDave Airlie
Linux 3.10-rc7 The sdvo lvds fix in this -fixes pull commit c3456fb3e4712d0448592af3c5d644c9472cd3c1 Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Mon Jun 10 09:47:58 2013 +0200 drm/i915: prefer VBT modes for SVDO-LVDS over EDID has a silent functional conflict with commit 990256aec2f10800595dddf4d1c3441fcd6b2616 Author: Ville Syrjälä <ville.syrjala@linux.intel.com> Date: Fri May 31 12:17:07 2013 +0000 drm: Add probed modes in probe order in drm-next. W simply need to add the vbt modes before edid modes, i.e. the other way round than now. Conflicts: drivers/gpu/drm/drm_prime.c drivers/gpu/drm/i915/intel_sdvo.c
2013-06-26perf/x86: Disable PEBS-LL in intel_pmu_pebs_disable()Stephane Eranian
Make sure intel_pmu_pebs_disable() and intel_pmu_pebs_enable() are symmetrical w.r.t. PEBS-LL and precise store. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1371824448-7306-2-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-26perf/x86: Fix shared register mutual exclusion enforcementStephane Eranian
This patch fixes a problem with the shared registers mutual exclusion code and incremental event scheduling by the generic perf_event code. There was a bug whereby the mutual exclusion on the shared registers was not enforced because of incremental scheduling abort due to event constraints. As an example on Intel Nehalem, consider the following events: group1= L1D_CACHE_LD:E_STATE,OFFCORE_RESPONSE_0:PF_RFO,L1D_CACHE_LD:I_STATE group2= L1D_CACHE_LD:I_STATE The L1D_CACHE_LD event can only be measured by 2 counters. Yet, there are 3 instances here. The first group can be scheduled and is committed. Then, the generic code tries to schedule group2 and this fails (because there is no more counter to support the 3rd instance of L1D_CACHE_LD). But in x86_schedule_events() error path, put_event_contraints() is invoked on ALL the events and not just the ones that just failed. That causes the "lock" on the shared offcore_response MSR to be released. Yet the first group is actually scheduled and is exposed to reprogramming of that shared msr by the sibling HT thread. In other words, there is no guarantee on what is measured. This patch fixes the problem by tagging committed events with the PERF_X86_EVENT_COMMITTED tag. In the error path of x86_schedule_events(), only the events NOT tagged have their constraint released. The tag is eventually removed when the event in descheduled. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20130620164254.GA3556@quad Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-26perf/x86/intel: Support full width countingAndi Kleen
Recent Intel CPUs like Haswell and IvyBridge have a new alternative MSR range for perfctrs that allows writing the full counter width. Enable this range if the hardware reports it using a new capability bit. Currently the perf code queries CPUID to get the counter width, and sign extends the counter values as needed. The traditional PERFCTR MSRs always limit to 32bit, even though the counter internally is larger (usually 48 bits on recent CPUs) When the new capability is set use the alternative range which do not have these restrictions. This lowers the overhead of perf stat slightly because it has to do less interrupts to accumulate the counter value. On Haswell it also avoids some problems with TSX aborting when the end of the counter range is reached. ( See the patch "perf/x86/intel: Avoid checkpointed counters causing excessive TSX aborts" for more details. ) Signed-off-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Stephane Eranian <eranian@google.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1372173153-20215-1-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-26Merge tag 'please-pull-mce-bitmap-comment' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras Pull MCE updates from Tony Luck: "Better comments so we understand our existing machine check bank bitmaps - prelude to adding another bitmap soon." Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-25x86, asm, cleanup: Replace open-coded control register values with symbolicH. Peter Anvin
Clean up an unnecessary open-coded control register values. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/n/tip-um7za1nzf6brb17o0h4om6e3@git.kernel.org
2013-06-25mce: acpi/apei: Add comments to clarify usage of the various bitfields in ↵Naveen N. Rao
the MCA subsystem There is some confusion about the 'mce_poll_banks' and 'mce_banks_owned' per-cpu bitmaps. Provide comments so that we all know exactly what these are used for, and why. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-06-25x86: Fix /proc/mtrr with base/size more than 44bitsYinghai Lu
On one sytem that mtrr range is more then 44bits, in dmesg we have [ 0.000000] MTRR default type: write-back [ 0.000000] MTRR fixed ranges enabled: [ 0.000000] 00000-9FFFF write-back [ 0.000000] A0000-BFFFF uncachable [ 0.000000] C0000-DFFFF write-through [ 0.000000] E0000-FFFFF write-protect [ 0.000000] MTRR variable ranges enabled: [ 0.000000] 0 [000080000000-0000FFFFFFFF] mask 3FFF80000000 uncachable [ 0.000000] 1 [380000000000-38FFFFFFFFFF] mask 3F0000000000 uncachable [ 0.000000] 2 [000099000000-000099FFFFFF] mask 3FFFFF000000 write-through [ 0.000000] 3 [00009A000000-00009AFFFFFF] mask 3FFFFF000000 write-through [ 0.000000] 4 [381FFA000000-381FFBFFFFFF] mask 3FFFFE000000 write-through [ 0.000000] 5 [381FFC000000-381FFC0FFFFF] mask 3FFFFFF00000 write-through [ 0.000000] 6 [0000AD000000-0000ADFFFFFF] mask 3FFFFF000000 write-through [ 0.000000] 7 [0000BD000000-0000BDFFFFFF] mask 3FFFFF000000 write-through [ 0.000000] 8 disabled [ 0.000000] 9 disabled but /proc/mtrr report wrong: reg00: base=0x080000000 ( 2048MB), size= 2048MB, count=1: uncachable reg01: base=0x80000000000 (8388608MB), size=1048576MB, count=1: uncachable reg02: base=0x099000000 ( 2448MB), size= 16MB, count=1: write-through reg03: base=0x09a000000 ( 2464MB), size= 16MB, count=1: write-through reg04: base=0x81ffa000000 (8519584MB), size= 32MB, count=1: write-through reg05: base=0x81ffc000000 (8519616MB), size= 1MB, count=1: write-through reg06: base=0x0ad000000 ( 2768MB), size= 16MB, count=1: write-through reg07: base=0x0bd000000 ( 3024MB), size= 16MB, count=1: write-through reg08: base=0x09b000000 ( 2480MB), size= 16MB, count=1: write-combining so bit 44 and bit 45 get cut off. We have problems in arch/x86/kernel/cpu/mtrr/generic.c::generic_get_mtrr(). 1. for base, we miss cast base_lo to 64bit before shifting. Fix that by adding u64 casting. 2. for size, it only can handle 44 bits aka 32bits + page_shift Fix that with 64bit mask instead of 32bit mask_lo, then range could be more than 44bits. At the same time, we need to update size_or_mask for old cpus that does support cpuid 0x80000008 to get phys_addr. Need to set high 32bits to all 1s, otherwise will not get correct size for them. Also fix mtrr_add_page: it should check base and (base + size - 1) instead of base and size, as base and size could be small but base + size could bigger enough to be out of boundary. We can use boot_cpu_data.x86_phys_bits directly to avoid size_or_mask. So When are we going to have size more than 44bits? that is 16TiB. after patch we have right ouput: reg00: base=0x080000000 ( 2048MB), size= 2048MB, count=1: uncachable reg01: base=0x380000000000 (58720256MB), size=1048576MB, count=1: uncachable reg02: base=0x099000000 ( 2448MB), size= 16MB, count=1: write-through reg03: base=0x09a000000 ( 2464MB), size= 16MB, count=1: write-through reg04: base=0x381ffa000000 (58851232MB), size= 32MB, count=1: write-through reg05: base=0x381ffc000000 (58851264MB), size= 1MB, count=1: write-through reg06: base=0x0ad000000 ( 2768MB), size= 16MB, count=1: write-through reg07: base=0x0bd000000 ( 3024MB), size= 16MB, count=1: write-through reg08: base=0x09b000000 ( 2480MB), size= 16MB, count=1: write-combining -v2: simply checking in mtrr_add_page according to hpa. [ hpa: This probably wants to go into -stable only after having sat in mainline for a bit. It is not a regression. ] Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1371162815-29931-1-git-send-email-yinghai@kernel.org Cc: <stable@vger.kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-23perf: Drop sample rate when sampling is too slowDave Hansen
This patch keeps track of how long perf's NMI handler is taking, and also calculates how many samples perf can take a second. If the sample length times the expected max number of samples exceeds a configurable threshold, it drops the sample rate. This way, we don't have a runaway sampling process eating up the CPU. This patch can tend to drop the sample rate down to level where perf doesn't work very well. *BUT* the alternative is that my system hangs because it spends all of its time handling NMIs. I'll take a busted performance tool over an entire system that's busted and undebuggable any day. BTW, my suspicion is that there's still an underlying bug here. Using the HPET instead of the TSC is definitely a contributing factor, but I suspect there are some other things going on. But, I can't go dig down on a bug like that with my machine hanging all the time. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus@samba.org Cc: acme@ghostprotocols.net Cc: Dave Hansen <dave@sr71.net> [ Prettified it a bit. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-21Merge branch 'x86/urgent' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Peter Anvin: "This series fixes a couple of build failures, and fixes MTRR cleanup and memory setup on very specific memory maps. Finally, it fixes triggering backtraces on all CPUs, which was inadvertently disabled on x86." * 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/efi: Fix dummy variable buffer allocation x86: Fix trigger_all_cpu_backtrace() implementation x86: Fix section mismatch on load_ucode_ap x86: fix build error and kconfig for ia32_emulation and binfmt range: Do not add new blank slot with add_range_with_merge x86, mtrr: Fix original mtrr range get for mtrr_cleanup
2013-06-21x86, trace: Add irq vector tracepointsSeiji Aguchi
[Purpose of this patch] As Vaibhav explained in the thread below, tracepoints for irq vectors are useful. http://www.spinics.net/lists/mm-commits/msg85707.html <snip> The current interrupt traces from irq_handler_entry and irq_handler_exit provide when an interrupt is handled. They provide good data about when the system has switched to kernel space and how it affects the currently running processes. There are some IRQ vectors which trigger the system into kernel space, which are not handled in generic IRQ handlers. Tracing such events gives us the information about IRQ interaction with other system events. The trace also tells where the system is spending its time. We want to know which cores are handling interrupts and how they are affecting other processes in the system. Also, the trace provides information about when the cores are idle and which interrupts are changing that state. <snip> On the other hand, my usecase is tracing just local timer event and getting a value of instruction pointer. I suggested to add an argument local timer event to get instruction pointer before. But there is another way to get it with external module like systemtap. So, I don't need to add any argument to irq vector tracepoints now. [Patch Description] Vaibhav's patch shared a trace point ,irq_vector_entry/irq_vector_exit, in all events. But there is an above use case to trace specific irq_vector rather than tracing all events. In this case, we are concerned about overhead due to unwanted events. So, add following tracepoints instead of introducing irq_vector_entry/exit. so that we can enable them independently. - local_timer_vector - reschedule_vector - call_function_vector - call_function_single_vector - irq_work_entry_vector - error_apic_vector - thermal_apic_vector - threshold_apic_vector - spurious_apic_vector - x86_platform_ipi_vector Also, introduce a logic switching IDT at enabling/disabling time so that a time penalty makes a zero when tracepoints are disabled. Detailed explanations are as follows. - Create trace irq handlers with entering_irq()/exiting_irq(). - Create a new IDT, trace_idt_table, at boot time by adding a logic to _set_gate(). It is just a copy of original idt table. - Register the new handlers for tracpoints to the new IDT by introducing macros to alloc_intr_gate() called at registering time of irq_vector handlers. - Add checking, whether irq vector tracing is on/off, into load_current_idt(). This has to be done below debug checking for these reasons. - Switching to debug IDT may be kicked while tracing is enabled. - On the other hands, switching to trace IDT is kicked only when debugging is disabled. In addition, the new IDT is created only when CONFIG_TRACING is enabled to avoid being used for other purposes. Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com> Link: http://lkml.kernel.org/r/51C323ED.5050708@hds.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Steven Rostedt <rostedt@goodmis.org>
2013-06-21x86: Rename variables for debuggingSeiji Aguchi
Rename variables for debugging to describe meaning of them precisely. Also, introduce a generic way to switch IDT by checking a current state, debug on/off. Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com> Link: http://lkml.kernel.org/r/51C323A8.7050905@hds.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Steven Rostedt <rostedt@goodmis.org>
2013-06-21x86, trace: Introduce entering/exiting_irq()Seiji Aguchi
When implementing tracepoints in interrupt handers, if the tracepoints are simply added in the performance sensitive path of interrupt handers, it may cause potential performance problem due to the time penalty. To solve the problem, an idea is to prepare non-trace/trace irq handers and switch their IDTs at the enabling/disabling time. So, let's introduce entering_irq()/exiting_irq() for pre/post- processing of each irq handler. A way to use them is as follows. Non-trace irq handler: smp_irq_handler() { entering_irq(); /* pre-processing of this handler */ __smp_irq_handler(); /* * common logic between non-trace and trace handlers * in a vector. */ exiting_irq(); /* post-processing of this handler */ } Trace irq_handler: smp_trace_irq_handler() { entering_irq(); /* pre-processing of this handler */ trace_irq_entry(); /* tracepoint for irq entry */ __smp_irq_handler(); /* * common logic between non-trace and trace handlers * in a vector. */ trace_irq_exit(); /* tracepoint for irq exit */ exiting_irq(); /* post-processing of this handler */ } If tracepoints can place outside entering_irq()/exiting_irq() as follows, it looks cleaner. smp_trace_irq_handler() { trace_irq_entry(); smp_irq_handler(); trace_irq_exit(); } But it doesn't work. The problem is with irq_enter/exit() being called. They must be called before trace_irq_enter/exit(), because of the rcu_irq_enter() must be called before any tracepoints are used, as tracepoints use rcu to synchronize. As a possible alternative, we may be able to call irq_enter() first as follows if irq_enter() can nest. smp_trace_irq_hander() { irq_entry(); trace_irq_entry(); smp_irq_handler(); trace_irq_exit(); irq_exit(); } But it doesn't work, either. If irq_enter() is nested, it may have a time penalty because it has to check if it was already called or not. The time penalty is not desired in performance sensitive paths even if it is tiny. Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com> Link: http://lkml.kernel.org/r/51C3238D.9040706@hds.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Steven Rostedt <rostedt@goodmis.org>
2013-06-21x86: Add a static_cpu_has_safe variantBorislav Petkov
We want to use this in early code where alternatives might not have run yet and for that case we fall back to the dynamic boot_cpu_has. For that, force a 5-byte jump since the compiler could be generating differently sized jumps for each label. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1370772454-6106-5-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-21x86: Sanity-check static_cpu_has usageBorislav Petkov
static_cpu_has may be used only after alternatives have run. Before that it always returns false if constant folding with __builtin_constant_p() doesn't happen. And you don't want that. This patch is the result of me debugging an issue where I overzealously put static_cpu_has in code which executed before alternatives have run and had to spend some time with scratching head and cursing at the monitor. So add a jump to a warning which screams loudly when we use this function too early. The alternatives patch that check away in conjunction with patching the rest of the kernel image. [ hpa: factored this into its own configuration option. If we want to have an overarching option, it should be an option which selects other options, not as a group option in the source code. ] Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1370772454-6106-4-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-21x86, cpu: Add a synthetic, always true, cpu featureBorislav Petkov
This will be used in alternatives later as an always-replace flag. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1370772454-6106-2-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-20x86/intel/cacheinfo: Shut up last long-standing warningBorislav Petkov
arch/x86/kernel/cpu/intel_cacheinfo.c: In function ‘init_intel_cacheinfo’: arch/x86/kernel/cpu/intel_cacheinfo.c:642:28: warning: ‘this_leaf.size’ may be used uninitialized in this function [-Wmaybe-uninitialized] arch/x86/kernel/cpu/intel_cacheinfo.c:643:29: warning: ‘this_leaf.eax.split.num_threads_sharing’ may be used uninitialized in this function [-Wmaybe-uninitialized] This keeps on happening during randbuilds and the compiler is wrong here: In the case where cpuid4_cache_lookup_regs() returns 0, both this_leaf.size and this_leaf.eax get initialized. In the case where the CPUID leaf doesn't contain valid cache info, we error out which init_intel_cacheinfo() handles correctly without touching the abovementioned fields. So shut up the warning by clearing out the struct which we hand down. While at it, reverse error handling and gain one indentation level. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1370710095-20547-1-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19perf/x86/intel: Add mem-loads/stores support for HaswellAndi Kleen
mem-loads is basically the same as Sandy Bridge, but we use a separate string for changes later. Haswell doesn't support the full precise store mode, so we emulate it using the "DataLA" facility. This allows to do everything, but for data sources we can only detect L1 hit or not. There is no explicit enable bit anymore, so we have to tie it to a perf internal only flag. The address is supported for all memory related PEBS events with DataLA. Instead of only logging for the load and store events we allow logging it for all (it will be simply 0 if the current event does not support it) Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Andi Kleen <ak@linux.jf.intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Link: http://lkml.kernel.org/r/1371515812-9646-7-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19perf/x86/intel: Support Haswell/v4 LBR formatAndi Kleen
Haswell has two additional LBR from flags for TSX: in_tx and abort_tx, implemented as a new "v4" version of the LBR format. Handle those in and adjust the sign extension code to still correctly extend. The flags are exported similarly in the LBR record to the existing misprediction flag Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Andi Kleen <ak@linux.jf.intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Link: http://lkml.kernel.org/r/1371515812-9646-6-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19perf/x86/intel: Move NMI clearing to end of PMI handlerAndi Kleen
This avoids some problems with spurious PMIs on Haswell. Haswell seems to behave more like P4 in this regard. Do the same thing as the P4 perf handler by unmasking the NMI only at the end. Shouldn't make any difference for earlier family 6 cores. (Tested on Haswell, IvyBridge, Westmere, Saltwell (Atom).) Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Andi Kleen <ak@linux.jf.intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Link: http://lkml.kernel.org/r/1371515812-9646-5-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19perf/x86/intel: Add Haswell PEBS supportAndi Kleen
Add simple PEBS support for Haswell. The constraints are similar to SandyBridge with a few new events. Reviewed-by: Stephane Eranian <eranian@google.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Andi Kleen <ak@linux.jf.intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Link: http://lkml.kernel.org/r/1371515812-9646-4-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19perf/x86/intel: Add simple Haswell PMU supportAndi Kleen
Similar to SandyBridge, but has a few new events and two new counter bits. There are some new counter flags that need to be prevented from being set on fixed counters, and allowed to be set for generic counters. Also we add support for the counter 2 constraint to handle all raw events. (Contains fixes from Stephane Eranian.) Reviewed-by: Stephane Eranian <eranian@google.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Andi Kleen <ak@linux.jf.intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Link: http://lkml.kernel.org/r/1371515812-9646-3-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19perf/x86/intel: Add Haswell PEBS record supportAndi Kleen
Add support for the Haswell extended (fmt2) PEBS format. It has a superset of the nhm (fmt1) PEBS fields, but has a longer record so we need to adjust the code paths. The main advantage is the new "EventingRip" support which directly gives the instruction, not off-by-one instruction. So with precise == 2 we use that directly and don't try to use LBRs and walking basic blocks. This lowers the overhead of using precise significantly. Some other features are added in later patches. Reviewed-by: Stephane Eranian <eranian@google.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Andi Kleen <ak@linux.jf.intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Link: http://lkml.kernel.org/r/1371515812-9646-2-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19perf/x86/intel: Fix sparse warningYan, Zheng
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1370421025-10986-1-git-send-email-zheng.z.yan@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19perf/x86/amd: AMD IOMMU Performance Counter PERF uncore PMU implementationSuravee Suthikulpanit
Implement a perf PMU to handle IOMMU performance counters and events. The PMU only supports counting mode (e.g. perf stat). Since the counters are shared across all cores, the PMU is implemented as "system-wide" mode. To invoke the AMD IOMMU PMU, issue a perf tool command such as: ./perf stat -a -e amd_iommu/<events>/ <command> or: ./perf stat -a -e amd_iommu/config=<config-data>,config1=<config1-data>/ <command> For example: ./perf stat -a -e amd_iommu/mem_trans_total/ <command> The resulting count will be how many IOMMU total peripheral memory operations were performed during the command execution window. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1370466709-3212-3-git-send-email-suravee.suthikulpanit@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19perf/x86: Only print PMU state when also WARN()'ingDave Hansen
intel_pmu_handle_irq() has a warning in it if it does too many loops. It is a WARN_ONCE(), but the perf_event_print_debug() call beneath it is unconditional. For the first warning, you get a nice backtrace and message, but subsequent ones just dump the PMU state with no leading messages. I doubt this is what was intended. This patch will only print the PMU state when paired with the WARN_ON() text. It effectively open-codes WARN_ONCE()'s one-time-only logic. My suspicion is that the code really just wants to make sure we do not sit in the loop and spit out a warning for every loop iteration after the 100th. From what I've seen, this is very unlikely to happen since we also clear the PMU state. After this patch, instead of seeing the PMU state dumped each time, you will just see: [57494.894540] perf_event_intel: clearing PMU state on CPU#129 [57579.539668] perf_event_intel: clearing PMU state on CPU#10 [57587.137762] perf_event_intel: clearing PMU state on CPU#134 [57623.039912] perf_event_intel: clearing PMU state on CPU#114 [57644.559943] perf_event_intel: clearing PMU state on CPU#118 ... Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20130530174559.0DB049F4@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19perf/x86: Reduce stack usage of x86_schedule_events()Andrew Hunter
x86_schedule_events() caches event constraints on the stack during scheduling. Given the number of possible events, this is 512 bytes of stack; since it can be invoked under schedule() under god-knows-what, this is causing stack blowouts. Trade some space usage for stack safety: add a place to cache the constraint pointer to struct perf_event. For 8 bytes per event (1% of its size) we can save the giant stack frame. This shouldn't change any aspect of scheduling whatsoever and while in theory the locality's a tiny bit worse, I doubt we'll see any performance impact either. Tested: `perf stat whatever` does not blow up and produces results that aren't hugely obviously wrong. I'm not sure how to run particularly good tests of perf code, but this should not produce any functional change whatsoever. Signed-off-by: Andrew Hunter <ahh@google.com> Reviewed-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1369332423-4400-1-git-send-email-ahh@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19Merge branch 'perf/urgent' into perf/coreIngo Molnar
Merge in the latest fixes, to avoid conflicts with ongoing work. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19perf/x86: Fix broken PEBS-LL support on SNB-EP/IVB-EPStephane Eranian
This patch fixes broken support of PEBS-LL on SNB-EP/IVB-EP. For some reason, the LDLAT extra reg definition for snb_ep showed up as duplicate in the snb table. This patch moves the definition of LDLAT back into the snb_ep table. Thanks to Don Zickus for tracking this one down. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20130607212210.GA11849@quad Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-18x86, mtrr: Fix original mtrr range get for mtrr_cleanupYinghai Lu
Joshua reported: Commit cd7b304dfaf1 (x86, range: fix missing merge during add range) broke mtrr cleanup on his setup in 3.9.5. corresponding commit in upstream is fbe06b7bae7c. *BAD*gran_size: 64K chunk_size: 16M num_reg: 6 lose cover RAM: -0G https://bugzilla.kernel.org/show_bug.cgi?id=59491 So it rejects new var mtrr layout. It turns out we have some problem with initial mtrr range retrieval. The current sequence is: x86_get_mtrr_mem_range ==> bunchs of add_range_with_merge ==> bunchs of subract_range ==> clean_sort_range add_range_with_merge for [0,1M) sort_range() add_range_with_merge could have blank slots, so we can not just sort only, that will have final result have extra blank slot in head. So move that calling add_range_with_merge for [0,1M), with that we could avoid extra clean_sort_range calling. Reported-by: Joshua Covington <joshuacov@googlemail.com> Tested-by: Joshua Covington <joshuacov@googlemail.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1371154622-8929-2-git-send-email-yinghai@kernel.org Cc: <stable@vger.kernel.org> v3.9 Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-14x86 thermal: Disable power limit notification interrupt by defaultFenghua Yu
The package power limit notification interrupt is primarily for system diagnosis, and should not be blindly enabled on every system by default -- particuarly since Linux does nothing in the handler except count how many times it has been called... Add a new kernel cmdline parameter "int_pln_enable" for situations where users want to oberve these events via existing system counters: $ grep TRM /proc/interrupts $ grep . /sys/devices/system/cpu/cpu*/thermal_throttle/* https://bugzilla.kernel.org/show_bug.cgi?id=36182 Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-06-14x86 thermal: Delete power-limit-notification console messagesFenghua Yu
Package power limits are common on some systems under some conditions -- so printing console messages when limits are reached causes unnecessary customer concern and support calls. Note that even with these console messages gone, the events can still be observed via system counters: $ grep TRM /proc/interrupts Shows total thermal interrupts, which includes both power limit notifications and thermal throttling interrupts. $ grep . /sys/devices/system/cpu/cpu*/thermal_throttle/* Will show what caused those interrupts, core and package throttling and power limit notifications. https://bugzilla.kernel.org/show_bug.cgi?id=36182 Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-06-13x86, mcheck, therm_throt: Process package thresholdsSrinivas Pandruvada
Added callback registration for package threshold reports. Also added a callback to check the rate control implemented in callback or not. If there is no rate control implemented, then there is a default rate control similar to core threshold notification by delaying for CHECK_INTERVAL (5 minutes) between reports. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2013-06-06x86: Get rid of ->hard_math and all the FPU asm fuH. Peter Anvin
Reimplement FPU detection code in C and drop old, not-so-recommended detection method in asm. Move all the relevant stuff into i387.c where it conceptually belongs. Finally drop cpuinfo_x86.hard_math. [ hpa: huge thanks to Borislav for taking my original concept patch and productizing it ] [ Boris, note to self: do not use static_cpu_has before alternatives! ] Signed-off-by: H. Peter Anvin <hpa@zytor.com> Link: http://lkml.kernel.org/r/1367244262-29511-2-git-send-email-bp@alien8.de Link: http://lkml.kernel.org/r/1365436666-9837-2-git-send-email-bp@alien8.de Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-05x86, mce: Fix "braodcast" typoMathias Krause
Fix the typo in MCJ_IRQ_BRAODCAST. Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Borislav Petkov <bp@suse.de>
2013-05-31Add arch_phys_wc_{add, del} to manipulate WC MTRRs if neededAndy Lutomirski
Several drivers currently use mtrr_add through various #ifdef guards and/or drm wrappers. The vast majority of them want to add WC MTRRs on x86 systems and don't actually need the MTRR if PAT (i.e. ioremap_wc, etc) are working. arch_phys_wc_add and arch_phys_wc_del are new functions, available on all architectures and configurations, that add WC MTRRs on x86 if needed (and handle errors) and do nothing at all otherwise. They're also easier to use than mtrr_add and mtrr_del, so the call sites can be simplified. As an added benefit, this will avoid wasting MTRRs and possibly warning pointlessly on PAT-supporting systems. Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Dave Airlie <airlied@redhat.com>
2013-05-28update AMD powerflags commentsThorsten Glaser
from http://www.flounder.com/cpuid80000007amd.gif and http://support.amd.com/us/Embedded_TechDocs/25481.pdf Signed-off-by: Thorsten Glaser <t.glaser@tarent.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-05-28perf/x86/amd: Rework AMD PMU init codePeter Zijlstra
Josh reported that his QEMU is a bad hardware emulator and trips a WARN in the AMD PMU init code. He requested the WARN be turned into a pr_err() or similar. While there, rework the code a little. Reported-by: Josh Boyer <jwboyer@redhat.com> Acked-by: Robert Richter <rric@kernel.org> Acked-by: Jacob Shin <jacob.shin@amd.com> Cc: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20130521110537.GG26912@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>