summaryrefslogtreecommitdiff
path: root/arch/x86/kernel/ftrace.c
AgeCommit message (Collapse)Author
2010-02-27Merge branch 'tracing/core' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into tracing/core
2010-02-25ftrace: Remove memory barriers from NMI code when not neededSteven Rostedt
The code in stop_machine that modifies the kernel text has a bit of logic to handle the case of NMIs. stop_machine does not prevent NMIs from executing, and if an NMI were to trigger on another CPU as the modifying CPU is changing the NMI text, a GPF could result. To prevent the GPF, the NMI calls ftrace_nmi_enter() which may modify the code first, then any other NMIs will just change the text to the same content which will do no harm. The code that stop_machine called must wait for NMIs to finish while it changes each location in the kernel. That code may also change the text to what the NMI changed it to. The key is that the text will never change content while another CPU is executing it. To make the above work, the call to ftrace_nmi_enter() must also do a smp_mb() as well as atomic_inc(). But for applications like perf that require a high number of NMIs for profiling, this can have a dramatic effect on the system. Not only is it doing a full memory barrier on both nmi_enter() as well as nmi_exit() it is also modifying a global variable with an atomic operation. This kills performance on large SMP machines. Since the memory barriers are only needed when ftrace is in the process of modifying the text (which is seldom), this patch adds a "modifying_code" variable that gets set before stop machine is executed and cleared afterwards. The NMIs will check this variable and store it in a per CPU "save_modifying_code" variable that it will use to check if it needs to do the memory barriers and atomic dec on NMI exit. Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-02-17tracing: Unify arch_syscall_addr() implementationsMike Frysinger
Most implementations of arch_syscall_addr() are the same, so create a default version in common code and move the one piece that differs (the syscall table) to asm/syscall.h. New arch ports don't have to waste time copying & pasting this simple function. The s390/sparc versions need to be different, so document why. Signed-off-by: Mike Frysinger <vapier@gentoo.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Paul Mundt <lethal@linux-sh.org> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1264498803-17278-1-git-send-email-vapier@gentoo.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-12-08Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (36 commits) x86, mm: Correct the implementation of is_untracked_pat_range() x86/pat: Trivial: don't create debugfs for memtype if pat is disabled x86, mtrr: Fix sorting of mtrr after subtracting x86: Move find_smp_config() earlier and avoid bootmem usage x86, platform: Change is_untracked_pat_range() to bool; cleanup init x86: Change is_ISA_range() into an inline function x86, mm: is_untracked_pat_range() takes a normal semiclosed range x86, mm: Call is_untracked_pat_range() rather than is_ISA_range() x86: UV SGI: Don't track GRU space in PAT x86: SGI UV: Fix BAU initialization x86, numa: Use near(er) online node instead of roundrobin for NUMA x86, numa, bootmem: Only free bootmem on NUMA failure path x86: Change crash kernel to reserve via reserve_early() x86: Eliminate redundant/contradicting cache line size config options x86: When cleaning MTRRs, do not fold WP into UC x86: remove "extern" from function prototypes in <asm/proto.h> x86, mm: Report state of NX protections during boot x86, mm: Clean up and simplify NX enablement x86, pageattr: Make set_memory_(x|nx) aware of NX support x86, sleep: Always save the value of EFER ... Fix up conflicts (added both iommu_shutdown and is_untracked_pat_range) to 'struct x86_platform_ops') in arch/x86/include/asm/x86_init.h arch/x86/kernel/x86_init.c
2009-11-02x86_64, ftrace: Make ftrace use kernel identity mapping to modify codeSuresh Siddha
On x86_64, kernel text mappings are mapped read-only with CONFIG_DEBUG_RODATA. So use the kernel identity mapping instead of the kernel text mapping to modify the kernel text. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Tested-by: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <20091029024821.080941108@sbs-t61.sc.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-14tracing: Move syscalls metadata handling from arch to coreFrederic Weisbecker
Most of the syscalls metadata processing is done from arch. But these operations are mostly generic accross archs. Especially now that we have a common variable name that expresses the number of syscalls supported by an arch: NR_syscalls, the only remaining bits that need to reside in arch is the syscall nr to addr translation. v2: Compare syscalls symbols only after the "sys" prefix so that we avoid spurious mismatches with archs that have syscalls wrappers, in which case syscalls symbols have "SyS" prefixed aliases. (Reported by: Heiko Carstens) Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Masami Hiramatsu <mhiramat@redhat.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org>
2009-10-12ftrace.c: Add #define pr_fmt(fmt) KBUILD_MODNAME ": " fmtJoe Perches
- Remove prefixes from pr_<level>, use pr_fmt(fmt). No change in output. Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <9b377eefae9e28c599dd4a17bdc81172965e9931.1254701151.git.joe@perches.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-26tracing: Convert event tracing code to use NR_syscallsJason Baron
Convert the syscalls event tracing code to use NR_syscalls, instead of FTRACE_SYSCALL_MAX. NR_syscalls is standard accross most arches, and reduces code confusion/complexity. Signed-off-by: Jason Baron <jbaron@redhat.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Jiaying Zhang <jiayingz@google.com> Cc: Martin Bligh <mbligh@google.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Josh Stone <jistone@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anwin <hpa@zytor.com> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> LKML-Reference: <9b4f1a84ecae57cc6599412772efa36f0d2b815b.1251146513.git.jbaron@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-08-11tracing: Add individual syscalls tracepoint id supportJason Baron
The current state of syscalls tracepoints generates only one event id for every syscall events. This patch associates an id with each syscall trace event, so that we can identify each syscall trace event using the 'perf' tool. Signed-off-by: Jason Baron <jbaron@redhat.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Jiaying Zhang <jiayingz@google.com> Cc: Martin Bligh <mbligh@google.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Masami Hiramatsu <mhiramat@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-08-11tracing: Call arch_init_ftrace_syscalls at bootJason Baron
Call arch_init_ftrace_syscalls at boot, so we can determine early the set of syscalls for the syscall trace events. Signed-off-by: Jason Baron <jbaron@redhat.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Jiaying Zhang <jiayingz@google.com> Cc: Martin Bligh <mbligh@google.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Masami Hiramatsu <mhiramat@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-08-11tracing: Map syscall name to numberJason Baron
Add a new function to support translating a syscall name to number at runtime. This allows the syscall event tracer to map syscall names to number. Signed-off-by: Jason Baron <jbaron@redhat.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Jiaying Zhang <jiayingz@google.com> Cc: Martin Bligh <mbligh@google.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Masami Hiramatsu <mhiramat@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-08-06tracing/function-graph-tracer: Drop the useless nmi protectionFrederic Weisbecker
The function graph tracer used to have a protection against NMI while entering a function entry tracing. But this is useless now, this tracer is reentrant and the ring buffer supports the NMI tracing. We can then drop this protection. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org>
2009-06-18function-graph: add stack frame testSteven Rostedt
In case gcc does something funny with the stack frames, or the return from function code, we would like to detect that. An arch may implement passing of a variable that is unique to the function and can be saved on entering a function and can be tested when exiting the function. Usually the frame pointer can be used for this purpose. This patch also implements this for x86. Where it passes in the stack frame of the parent function, and will test that frame on exit. There was a case in x86_32 with optimize for size (-Os) where, for a few functions, gcc would align the stack frame and place a copy of the return address into it. The function graph tracer modified the copy and not the actual return address. On return from the funtion, it did not go to the tracer hook, but returned to the parent. This broke the function graph tracer, because the return of the parent (where gcc did not do this funky manipulation) returned to the location that the child function was suppose to. This caused strange kernel crashes. This test detected the problem and pointed out where the issue was. This modifies the parameters of one of the functions that the arch specific code calls, so it includes changes to arch code to accommodate the new prototype. Note, I notice that the parsic arch implements its own push_return_trace. This is now a generic function and the ftrace_push_return_trace should be used instead. This patch does not touch that code. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Helge Deller <deller@gmx.de> Cc: Kyle McMartin <kyle@mcmartin.ca> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-05-13x86/function-graph: fix constraint for recording old return valueSteven Rostedt
After upgrading from gcc 4.2.2 to 4.4.0, the function graph tracer broke. Investigating, I found that in the asm that replaces the return value, gcc was using the same register for the old value as it was for the new value. mov (addr), old mov new, (addr) But if old and new are the same register, we clobber new with old! I first thought this was a bug in gcc 4.4.0 and reported it: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40132 Andrew Pinski responded (quickly), saying that it was correct gcc behavior and the code needed to denote old as an "early clobber". Instead of "=r"(old), we need "=&r"(old). [Impact: keep function graph tracer from breaking with gcc 4.4.0 ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-04-09tracing/syscalls: use a dedicated file headerFrederic Weisbecker
Impact: fix build warnings and possibe compat misbehavior on IA64 Building a kernel on ia64 might trigger these ugly build warnings: CC arch/ia64/ia32/sys_ia32.o In file included from arch/ia64/ia32/sys_ia32.c:55: arch/ia64/ia32/ia32priv.h:290:1: warning: "elf_check_arch" redefined In file included from include/linux/elf.h:7, from include/linux/module.h:14, from include/linux/ftrace.h:8, from include/linux/syscalls.h:68, from arch/ia64/ia32/sys_ia32.c:18: arch/ia64/include/asm/elf.h:19:1: warning: this is the location of the previous definition [...] sys_ia32.c includes linux/syscalls.h which in turn includes linux/ftrace.h to import the syscalls tracing prototypes. But including ftrace.h can pull too much things for a low level file, especially on ia64 where the ia32 private headers conflict with higher level headers. Now we isolate the syscall tracing headers in their own lightweight file. Reported-by: Tony Luck <tony.luck@intel.com> Tested-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Jason Baron <jbaron@redhat.com> Cc: "Frank Ch. Eigler" <fche@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Jiaying Zhang <jiayingz@google.com> Cc: Michael Rubin <mrubin@google.com> Cc: Martin Bligh <mbligh@google.com> Cc: Michael Davidson <md@google.com> LKML-Reference: <20090408184058.GB6017@nowhere> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-07tracing, x86: remove duplicated #includeHuang Weiyi
Remove duplicated #include in arch/x86/kernel/ftrace.c. Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> LKML-Reference: <1238503291-2532-1-git-send-email-weiyi.huang@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-24function-graph: moved the timestamp from arch to generic codeSteven Rostedt
This patch move the timestamp from happening in the arch specific code into the general code. This allows for better control by the tracer to time manipulation. Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-03-19ftrace: protect running nmi (V3)Lai Jiangshan
When I review the sensitive code ftrace_nmi_enter(), I found the atomic variable nmi_running does protect NMI VS do_ftrace_mod_code(), but it can not protects NMI(entered nmi) VS NMI(ftrace_nmi_enter()). cpu#1 | cpu#2 | cpu#3 ftrace_nmi_enter() | do_ftrace_mod_code() | not modify | | ------------------------|-----------------------|-- executing | set mod_code_write = 1| executing --|-----------------------|-------------------- executing | | ftrace_nmi_enter() executing | | do modify ------------------------|-----------------------|----------------- ftrace_nmi_exit() | | cpu#3 may be being modified the code which is still being executed on cpu#1, it will have undefined results and possibly take a GPF, this patch prevents it occurred. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> LKML-Reference: <49C0B411.30003@cn.fujitsu.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-03-13tracing/syscalls: support for syscalls tracing on x86Frederic Weisbecker
Extend x86 architecture syscall tracing support with syscall metadata table details. (The upcoming core syscall tracing modifications rely on this.) Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1236955332-10133-3-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-05tracing/function-graph-tracer: use the more lightweight local clockFrederic Weisbecker
Impact: decrease hangs risks with the graph tracer on slow systems Since the function graph tracer can spend too much time on timer interrupts, it's better now to use the more lightweight local clock. Anyway, the function graph traces are more reliable on a per cpu trace. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <49af243d.06e9300a.53ad.ffff840c@mx.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-22Merge branch 'tip/x86/ftrace' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace Conflicts: include/linux/ftrace.h kernel/trace/ftrace.c
2009-02-20ftrace: immediately stop code modification if failure is detectedSteven Rostedt
Impact: fix to prevent NMI lockup If the page fault handler produces a WARN_ON in the modifying of text, and the system is setup to have a high frequency of NMIs, we can lock up the system on a failure to modify code. The modifying of code with NMIs allows all NMIs to modify the code if it is about to run. This prevents a modifier on one CPU from modifying code running in NMI context on another CPU. The modifying is done through stop_machine, so only NMIs must be considered. But if the write causes the page fault handler to produce a warning, the print can slow it down enough that as soon as it is done it will take another NMI before going back to the process context. The new NMI will perform the write again causing another print and this will hang the box. This patch turns off the writing as soon as a failure is detected and does not wait for it to be turned off by the process context. This will keep NMIs from getting stuck in this back and forth of print outs. Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-20ftrace, x86: make kernel text writable only for conversionsSteven Rostedt
Impact: keep kernel text read only Because dynamic ftrace converts the calls to mcount into and out of nops at run time, we needed to always keep the kernel text writable. But this defeats the point of CONFIG_DEBUG_RODATA. This patch converts the kernel code to writable before ftrace modifies the text, and converts it back to read only afterward. The kernel text is converted to read/write, stop_machine is called to modify the code, then the kernel text is converted back to read only. The original version used SYSTEM_STATE to determine when it was OK or not to change the code to rw or ro. Andrew Morton pointed out that using SYSTEM_STATE is a bad idea since there is no guarantee to what its state will actually be. Instead, I moved the check into the set_kernel_text_* functions themselves, and use a local variable to determine when it is OK to change the kernel text RW permissions. [ Update: Ingo Molnar suggested moving the prototypes to cacheflush.h ] Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-19Merge branch 'mainline/function-graph' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/function-graph-tracer
2009-02-18tracing/function-graph-tracer: make arch generic push pop functionsSteven Rostedt
There is nothing really arch specific of the push and pop functions used by the function graph tracer. This patch moves them to generic code. Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-11tracing, x86: fix constraint for parent variableSteven Rostedt
The constraint used for retrieving and restoring the parent function pointer is incorrect. The parent variable is a pointer, and the address of the pointer is modified by the asm statement and not the pointer itself. It is incorrect to pass it in as an output constraint since the asm will never update the pointer. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-11Merge branch 'tip/tracing/ftrace' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace
2009-02-11Merge branches 'tracing/ftrace' and 'tracing/urgent' into tracing/coreIngo Molnar
2009-02-10tracing, x86: fix fixup section to return to original codeSteven Rostedt
Impact: fix to prevent a kernel crash on fault If for some reason the pointer to the parent function on the stack takes a fault, the fix up code will not return back to the original faulting code. This can lead to unpredictable results and perhaps even a kernel panic. A fault should not happen, but if it does, we should simply disable the tracer, warn, and continue running the kernel. It should not lead to a kernel crash. Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-10tracing, x86: fix constraint for parent variableSteven Rostedt
The constraint used for retrieving and restoring the parent function pointer is incorrect. The parent variable is a pointer, and the address of the pointer is modified by the asm statement and not the pointer itself. It is incorrect to pass it in as an output constraint since the asm will never update the pointer. Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-09tracing/function-graph-tracer: drop the kernel_text_address checkFrederic Weisbecker
When the function graph tracer picks a return address, it ensures this address is really a kernel text one by calling __kernel_text_address() Actually this path has never been taken.Its role was more likely to debug the tracer on the beginning of its development but this function is wasteful since it is called for every traced function. The fault check is already sufficient. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-08ring-buffer: use generic version of in_nmiSteven Rostedt
Impact: clean up Now that a generic in_nmi is available, this patch removes the special code in the ring_buffer and implements the in_nmi generic version instead. With this change, I was also able to rename the "arch_ftrace_nmi_enter" back to "ftrace_nmi_enter" and remove the code from the ring buffer. Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-08ftrace: change function graph tracer to use new in_nmiSteven Rostedt
The function graph tracer piggy backed onto the dynamic ftracer to use the in_nmi custom code for dynamic tracing. The problem was (as Andrew Morton pointed out) it really only wanted to bail out if the context of the current CPU was in NMI context. But the dynamic ftrace in_nmi custom code was true if _any_ CPU happened to be in NMI context. Now that we have a generic in_nmi interface, this patch changes the function graph code to use it instead of the dynamic ftarce custom code. Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-08ftrace, x86: rename in_nmi variableSteven Rostedt
Impact: clean up The in_nmi variable in x86 arch ftrace.c is a misnomer. Andrew Morton pointed out that the in_nmi variable is incremented by all CPUS. It can be set when another CPU is running an NMI. Since this is actually intentional, the fix is to rename it to what it really is: "nmi_running" Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-08ring-buffer: add NMI protection for spinlocksSteven Rostedt
Impact: prevent deadlock in NMI The ring buffers are not yet totally lockless with writing to the buffer. When a writer crosses a page, it grabs a per cpu spinlock to protect against a reader. The spinlocks taken by a writer are not to protect against other writers, since a writer can only write to its own per cpu buffer. The spinlocks protect against readers that can touch any cpu buffer. The writers are made to be reentrant with the spinlocks disabling interrupts. The problem arises when an NMI writes to the buffer, and that write crosses a page boundary. If it grabs a spinlock, it can be racing with another writer (since disabling interrupts does not protect against NMIs) or with a reader on the same CPU. Luckily, most of the users are not reentrant and protects against this issue. But if a user of the ring buffer becomes reentrant (which is what the ring buffers do allow), if the NMI also writes to the ring buffer then we risk the chance of a deadlock. This patch moves the ftrace_nmi_enter called by nmi_enter() to the ring buffer code. It replaces the current ftrace_nmi_enter that is used by arch specific code to arch_ftrace_nmi_enter and updates the Kconfig to handle it. When an NMI is called, it will set a per cpu variable in the ring buffer code and will clear it when the NMI exits. If a write to the ring buffer crosses page boundaries inside an NMI, a trylock is used on the spin lock instead. If the spinlock fails to be acquired, then the entry is discarded. This bug appeared in the ftrace work in the RT tree, where event tracing is reentrant. This workaround solved the deadlocks that appeared there. Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-01-27x86: ftrace - simplify wait_for_nmiCyrill Gorcunov
Get rid of 'waited' stack variable. Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-08tracing/function-graph-tracer: append the tracing_graph_flagFrederic Weisbecker
Impact: Provide a way to pause the function graph tracer As suggested by Steven Rostedt, the previous patch that prevented from spinlock function tracing shouldn't use the raw_spinlock to fix it. It's much better to follow lockdep with normal spinlock, so this patch adds a new flag for each task to make the function graph tracer able to be paused. We also can send an ftrace_printk whithout worrying of the irrelevant traced spinlock during insertion. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-03ftrace: add checks on ret stack in function graphSteven Rostedt
Import: robustness checks Add more checks in the function graph code to detect errors and perhaps print out better information if a bug happens. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-03ftrace: function graph return for function entrySteven Rostedt
Impact: feature, let entry function decide to trace or not This patch lets the graph tracer entry function decide if the tracing should be done at the end as well. This requires all function graph entry functions return 1 if it should trace, or 0 if the return should not be traced. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-03ftrace: add ftrace_graph_stop()Steven Rostedt
Impact: new ftrace_graph_stop function While developing more features of function graph, I hit a bug that caused the WARN_ON to trigger in the prepare_ftrace_return function. Well, it was hard for me to find out that was happening because the bug would not print, it would just cause a hard lockup or reboot. The reason is that it is not safe to call printk from this function. Looking further, I also found that it calls unregister_ftrace_graph, which grabs a mutex and calls kstop machine. This would definitely lock the box up if it were to trigger. This patch adds a fast and safe ftrace_graph_stop() which will stop the function tracer. Then it is safe to call the WARN ON. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-03ftrace: clean up function graph asmSteven Rostedt
Impact: clean up There exists macros for x86 asm to handle x86_64 and i386. This patch updates function graph asm to use them. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-02tracing/function-graph-tracer: support for x86-64Frederic Weisbecker
Impact: extend and enable the function graph tracer to 64-bit x86 This patch implements the support for function graph tracer under x86-64. Both static and dynamic tracing are supported. This causes some small CPP conditional asm on arch/x86/kernel/ftrace.c I wanted to use probe_kernel_read/write to make the return address saving/patching code more generic but it causes tracing recursion. That would be perhaps useful to implement a notrace version of these function for other archs ports. Note that arch/x86/process_64.c is not traced, as in X86-32. I first thought __switch_to() was responsible of crashes during tracing because I believed current task were changed inside but that's actually not the case (actually yes, but not the "current" pointer). So I will have to investigate to find the functions that harm here, to enable tracing of the other functions inside (but there is no issue at this time, while process_64.c stays out of -pg flags). A little possible race condition is fixed inside this patch too. When the tracer allocate a return stack dynamically, the current depth is not initialized before but after. An interrupt could occur at this time and, after seeing that the return stack is allocated, the tracer could try to trace it with a random uninitialized depth. It's a prevention, even if I hadn't problems with it. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tim Bird <tim.bird@am.sony.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-26ftrace: use code patching for ftrace graph tracerSteven Rostedt
Impact: more efficient code for ftrace graph tracer This patch uses the dynamic patching, when available, to patch the function graph code into the kernel. This patch will ease the way for letting both function tracing and function graph tracing run together. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-26tracing/function-return-tracer: set a more human readable outputFrederic Weisbecker
Impact: feature This patch sets a C-like output for the function graph tracing. For this aim, we now call two handler for each function: one on the entry and one other on return. This way we can draw a well-ordered call stack. The pid of the previous trace is loosely stored to be compared against the one of the current trace to see if there were a context switch. Without this little feature, the call tree would seem broken at some locations. We could use the sched_tracer to capture these sched_events but this way of processing is much more simpler. 2 spaces have been chosen for indentation to fit the screen while deep calls. The time of execution in nanosecs is printed just after closed braces, it seems more easy this way to find the corresponding function. If the time was printed as a first column, it would be not so easy to find the corresponding function if it is called on a deep depth. I plan to output the return value but on 32 bits CPU, the return value can be 32 or 64, and its difficult to guess on which case we are. I don't know what would be the better solution on X86-32: only print eax (low-part) or even edx (high-part). Actually it's thee same problem when a function return a 8 bits value, the high part of eax could contain junk values... Here is an example of trace: sys_read() { fget_light() { } 526 vfs_read() { rw_verify_area() { security_file_permission() { cap_file_permission() { } 519 } 1564 } 2640 do_sync_read() { pipe_read() { __might_sleep() { } 511 pipe_wait() { prepare_to_wait() { } 760 deactivate_task() { dequeue_task() { dequeue_task_fair() { dequeue_entity() { update_curr() { update_min_vruntime() { } 504 } 1587 clear_buddies() { } 512 add_cfs_task_weight() { } 519 update_min_vruntime() { } 511 } 5602 dequeue_entity() { update_curr() { update_min_vruntime() { } 496 } 1631 clear_buddies() { } 496 update_min_vruntime() { } 527 } 4580 hrtick_update() { hrtick_start_fair() { } 488 } 1489 } 13700 } 14949 } 16016 msecs_to_jiffies() { } 496 put_prev_task_fair() { } 504 pick_next_task_fair() { } 489 pick_next_task_rt() { } 496 pick_next_task_fair() { } 489 pick_next_task_idle() { } 489 ------------8<---------- thread 4 ------------8<---------- finish_task_switch() { } 1203 do_softirq() { __do_softirq() { __local_bh_disable() { } 669 rcu_process_callbacks() { __rcu_process_callbacks() { cpu_quiet() { rcu_start_batch() { } 503 } 1647 } 3128 __rcu_process_callbacks() { } 542 } 5362 _local_bh_enable() { } 587 } 8880 } 9986 kthread_should_stop() { } 669 deactivate_task() { dequeue_task() { dequeue_task_fair() { dequeue_entity() { update_curr() { calc_delta_mine() { } 511 update_min_vruntime() { } 511 } 2813 Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-26tracing/function-return-tracer: change the name into function-graph-tracerFrederic Weisbecker
Impact: cleanup This patch changes the name of the "return function tracer" into function-graph-tracer which is a more suitable name for a tracing which makes one able to retrieve the ordered call stack during the code flow. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23tracing/function-return-tracer: store return stack into task_struct and ↵Frederic Weisbecker
allocate it dynamically Impact: use deeper function tracing depth safely Some tests showed that function return tracing needed a more deeper depth of function calls. But it could be unsafe to store these return addresses to the stack. So these arrays will now be allocated dynamically into task_struct of current only when the tracer is activated. Typical scheme when tracer is activated: - allocate a return stack for each task in global list. - fork: allocate the return stack for the newly created task - exit: free return stack of current - idle init: same as fork I chose a default depth of 50. I don't have overruns anymore. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-18tracing/function-return-tracer: add the overrun fieldFrederic Weisbecker
Impact: help to find the better depth of trace We decided to arbitrary define the depth of function return trace as "20". Perhaps this is not enough. To help finding an optimal depth, we measure now the overrun: the number of functions that have been missed for the current thread. By default this is not displayed, we have to do set a particular flag on the return tracer: echo overrun > /debug/tracing/trace_options And the overrun will be printed on the right. As the trace shows below, the current 20 depth is not enough. update_wall_time+0x37f/0x8c0 -> update_xtime_cache (345 ns) (Overruns: 2838) update_wall_time+0x384/0x8c0 -> clocksource_get_next (1141 ns) (Overruns: 2838) do_timer+0x23/0x100 -> update_wall_time (3882 ns) (Overruns: 2838) tick_do_update_jiffies64+0xbf/0x160 -> do_timer (5339 ns) (Overruns: 2838) tick_sched_timer+0x6a/0xf0 -> tick_do_update_jiffies64 (7209 ns) (Overruns: 2838) vgacon_set_cursor_size+0x98/0x120 -> native_io_delay (2613 ns) (Overruns: 274) vgacon_cursor+0x16e/0x1d0 -> vgacon_set_cursor_size (33151 ns) (Overruns: 274) set_cursor+0x5f/0x80 -> vgacon_cursor (36432 ns) (Overruns: 274) con_flush_chars+0x34/0x40 -> set_cursor (38790 ns) (Overruns: 274) release_console_sem+0x1ec/0x230 -> up (721 ns) (Overruns: 274) release_console_sem+0x225/0x230 -> wake_up_klogd (316 ns) (Overruns: 274) con_flush_chars+0x39/0x40 -> release_console_sem (2996 ns) (Overruns: 274) con_write+0x22/0x30 -> con_flush_chars (46067 ns) (Overruns: 274) n_tty_write+0x1cc/0x360 -> con_write (292670 ns) (Overruns: 274) smp_apic_timer_interrupt+0x2a/0x90 -> native_apic_mem_write (330 ns) (Overruns: 274) irq_enter+0x17/0x70 -> idle_cpu (413 ns) (Overruns: 274) smp_apic_timer_interrupt+0x2f/0x90 -> irq_enter (1525 ns) (Overruns: 274) ktime_get_ts+0x40/0x70 -> getnstimeofday (465 ns) (Overruns: 274) ktime_get_ts+0x60/0x70 -> set_normalized_timespec (436 ns) (Overruns: 274) ktime_get+0x16/0x30 -> ktime_get_ts (2501 ns) (Overruns: 274) hrtimer_interrupt+0x77/0x1a0 -> ktime_get (3439 ns) (Overruns: 274) Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-16tracing/function-return-tracer: support for dynamic ftrace on function ↵Frederic Weisbecker
return tracer This patch adds the support for dynamic tracing on the function return tracer. The whole difference with normal dynamic function tracing is that we don't need to hook on a particular callback. The only pro that we want is to nop or set dynamically the calls to ftrace_caller (which is ftrace_return_caller here). Some security checks ensure that we are not trying to launch dynamic tracing for return tracing while normal function tracing is already running. An example of trace with getnstimeofday set as a filter: ktime_get_ts+0x22/0x50 -> getnstimeofday (2283 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1396 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1382 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1825 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1426 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1464 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1524 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1382 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1382 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1434 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1464 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1502 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1404 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1397 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1051 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1314 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1344 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1163 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1390 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1374 ns) Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-16tracing/function-return-tracer: add a barrier to ensure return stack index ↵Frederic Weisbecker
is incremented in memory Impact: fix possible race condition in ftrace function return tracer This fixes a possible race condition if index incrementation is not immediately flushed in memory. Thanks for Andi Kleen and Steven Rostedt for pointing out this issue and give me this solution. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-16ftrace: pass module struct to arch dynamic ftrace functionsSteven Rostedt
Impact: allow archs more flexibility on dynamic ftrace implementations Dynamic ftrace has largly been developed on x86. Since x86 does not have the same limitations as other architectures, the ftrace interaction between the generic code and the architecture specific code was not flexible enough to handle some of the issues that other architectures have. Most notably, module trampolines. Due to the limited branch distance that archs make in calling kernel core code from modules, the module load code must create a trampoline to jump to what will make the larger jump into core kernel code. The problem arises when this happens to a call to mcount. Ftrace checks all code before modifying it and makes sure the current code is what it expects. Right now, there is not enough information to handle modifying module trampolines. This patch changes the API between generic dynamic ftrace code and the arch dependent code. There is now two functions for modifying code: ftrace_make_nop(mod, rec, addr) - convert the code at rec->ip into a nop, where the original text is calling addr. (mod is the module struct if called by module init) ftrace_make_caller(rec, addr) - convert the code rec->ip that should be a nop into a caller to addr. The record "rec" now has a new field called "arch" where the architecture can add any special attributes to each call site record. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>