summaryrefslogtreecommitdiff
path: root/arch/x86/kernel/kprobes/core.c
AgeCommit message (Collapse)Author
2015-04-14Merge branch 'perf-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf changes from Ingo Molnar: "Core kernel changes: - One of the more interesting features in this cycle is the ability to attach eBPF programs (user-defined, sandboxed bytecode executed by the kernel) to kprobes. This allows user-defined instrumentation on a live kernel image that can never crash, hang or interfere with the kernel negatively. (Right now it's limited to root-only, but in the future we might allow unprivileged use as well.) (Alexei Starovoitov) - Another non-trivial feature is per event clockid support: this allows, amongst other things, the selection of different clock sources for event timestamps traced via perf. This feature is sought by people who'd like to merge perf generated events with external events that were measured with different clocks: - cluster wide profiling - for system wide tracing with user-space events, - JIT profiling events etc. Matching perf tooling support is added as well, available via the -k, --clockid <clockid> parameter to perf record et al. (Peter Zijlstra) Hardware enablement kernel changes: - x86 Intel Processor Trace (PT) support: which is a hardware tracer on steroids, available on Broadwell CPUs. The hardware trace stream is directly output into the user-space ring-buffer, using the 'AUX' data format extension that was added to the perf core to support hardware constraints such as the necessity to have the tracing buffer physically contiguous. This patch-set was developed for two years and this is the result. A simple way to make use of this is to use BTS tracing, the PT driver emulates BTS output - available via the 'intel_bts' PMU. More explicit PT specific tooling support is in the works as well - will probably be ready by 4.2. (Alexander Shishkin, Peter Zijlstra) - x86 Intel Cache QoS Monitoring (CQM) support: this is a hardware feature of Intel Xeon CPUs that allows the measurement and allocation/partitioning of caches to individual workloads. These kernel changes expose the measurement side as a new PMU driver, which exposes various QoS related PMU events. (The partitioning change is work in progress and is planned to be merged as a cgroup extension.) (Matt Fleming, Peter Zijlstra; CPU feature detection by Peter P Waskiewicz Jr) - x86 Intel Haswell LBR call stack support: this is a new Haswell feature that allows the hardware recording of call chains, plus tooling support. To activate this feature you have to enable it via the new 'lbr' call-graph recording option: perf record --call-graph lbr perf report or: perf top --call-graph lbr This hardware feature is a lot faster than stack walk or dwarf based unwinding, but has some limitations: - It reuses the current LBR facility, so LBR call stack and branch record can not be enabled at the same time. - It is only available for user-space callchains. (Yan, Zheng) - x86 Intel Broadwell CPU support and various event constraints and event table fixes for earlier models. (Andi Kleen) - x86 Intel HT CPUs event scheduling workarounds. This is a complex CPU bug affecting the SNB,IVB,HSW families that results in counter value corruption. The mitigation code is automatically enabled and is transparent. (Maria Dimakopoulou, Stephane Eranian) The perf tooling side had a ton of changes in this cycle as well, so I'm only able to list the user visible changes here, in addition to the tooling changes outlined above: User visible changes affecting all tools: - Improve support of compressed kernel modules (Jiri Olsa) - Save DSO loading errno to better report errors (Arnaldo Carvalho de Melo) - Bash completion for subcommands (Yunlong Song) - Add 'I' event modifier for perf_event_attr.exclude_idle bit (Jiri Olsa) - Support missing -f to override perf.data file ownership. (Yunlong Song) - Show the first event with an invalid filter (David Ahern, Arnaldo Carvalho de Melo) User visible changes in individual tools: 'perf data': New tool for converting perf.data to other formats, initially for the CTF (Common Trace Format) from LTTng (Jiri Olsa, Sebastian Siewior) 'perf diff': Add --kallsyms option (David Ahern) 'perf list': Allow listing events with 'tracepoint' prefix (Yunlong Song) Sort the output of the command (Yunlong Song) 'perf kmem': Respect -i option (Jiri Olsa) Print big numbers using thousands' group (Namhyung Kim) Allow -v option (Namhyung Kim) Fix alignment of slab result table (Namhyung Kim) 'perf probe': Support multiple probes on different binaries on the same command line (Masami Hiramatsu) Support unnamed union/structure members data collection. (Masami Hiramatsu) Check kprobes blacklist when adding new events. (Masami Hiramatsu) 'perf record': Teach 'perf record' about perf_event_attr.clockid (Peter Zijlstra) Support recording running/enabled time (Andi Kleen) 'perf sched': Improve the performance of 'perf sched replay' on high CPU core count machines (Yunlong Song) 'perf report' and 'perf top': Allow annotating entries in callchains in the hists browser (Arnaldo Carvalho de Melo) Indicate which callchain entries are annotated in the TUI hists browser (Arnaldo Carvalho de Melo) Add pid/tid filtering to 'report' and 'script' commands (David Ahern) Consider PERF_RECORD_ events with cpumode == 0 in 'perf top', removing one cause of long term memory usage buildup, i.e. not processing PERF_RECORD_EXIT events (Arnaldo Carvalho de Melo) 'perf stat': Report unsupported events properly (Suzuki K. Poulose) Output running time and run/enabled ratio in CSV mode (Andi Kleen) 'perf trace': Handle legacy syscalls tracepoints (David Ahern, Arnaldo Carvalho de Melo) Only insert blank duration bracket when tracing syscalls (Arnaldo Carvalho de Melo) Filter out the trace pid when no threads are specified (Arnaldo Carvalho de Melo) Dump stack on segfaults (Arnaldo Carvalho de Melo) No need to explicitely enable evsels for workload started from perf, let it be enabled via perf_event_attr.enable_on_exec, removing some events that take place in the 'perf trace' before a workload is really started by it. (Arnaldo Carvalho de Melo) Allow mixing with tracepoints and suppressing plain syscalls. (Arnaldo Carvalho de Melo) There's also been a ton of infrastructure work done, such as the split-out of perf's build system into tools/build/ and other changes - see the shortlog and changelog for details" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (358 commits) perf/x86/intel/pt: Clean up the control flow in pt_pmu_hw_init() perf evlist: Fix type for references to data_head/tail perf probe: Check the orphaned -x option perf probe: Support multiple probes on different binaries perf buildid-list: Fix segfault when show DSOs with hits perf tools: Fix cross-endian analysis perf tools: Fix error path to do closedir() when synthesizing threads perf tools: Fix synthesizing fork_event.ppid for non-main thread perf tools: Add 'I' event modifier for exclude_idle bit perf report: Don't call map__kmap if map is NULL. perf tests: Fix attr tests perf probe: Fix ARM 32 building error perf tools: Merge all perf_event_attr print functions perf record: Add clockid parameter perf sched replay: Use replay_repeat to calculate the runavg of cpu usage instead of the default value 10 perf sched replay: Support using -f to override perf.data file ownership perf sched replay: Fix the EMFILE error caused by the limitation of the maximum open files perf sched replay: Handle the dead halt of sem_wait when create_tasks() fails for any task perf sched replay: Fix the segmentation fault problem caused by pr_err in threads perf sched replay: Realloc the memory of pid_to_task stepwise to adapt to the different pid_max configurations ...
2015-03-23x86/asm/entry: Change all 'user_mode_vm()' calls to 'user_mode()'Andy Lutomirski
user_mode_vm() and user_mode() are now the same. Change all callers of user_mode_vm() to user_mode(). The next patch will remove the definition of user_mode_vm. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brad Spengler <spender@grsecurity.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/43b1f57f3df70df5a08b0925897c660725015554.1426728647.git.luto@kernel.org [ Merged to a more recent kernel. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-17kprobes/x86: Return correct length in __copy_instruction()Eugene Shatokhin
On x86-64, __copy_instruction() always returns 0 (error) if the instruction uses %rip-relative addressing. This is because kernel_insn_init() is called the second time for 'insn' instance in such cases and sets all its fields to 0. Because of this, trying to place a kprobe on such instruction will fail, register_kprobe() will return -EINVAL. This patch fixes the problem. Signed-off-by: Eugene Shatokhin <eugene.shatokhin@rosalab.ru> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Link: http://lkml.kernel.org/r/20150317100918.28349.94654.stgit@localhost.localdomain Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-21kprobes/x86: Check for invalid ftrace location in __recover_probed_insn()Petr Mladek
__recover_probed_insn() should always be called from an address where an instructions starts. The check for ftrace_location() might help to discover a potential inconsistency. This patch adds WARN_ON() when the inconsistency is detected. Also it adds handling of the situation when the original code can not get recovered. Suggested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Petr Mladek <pmladek@suse.cz> Cc: Ananth NMavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1424441250-27146-3-git-send-email-pmladek@suse.cz Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-21kprobes/x86: Use 5-byte NOP when the code might be modified by ftracePetr Mladek
can_probe() checks if the given address points to the beginning of an instruction. It analyzes all the instructions from the beginning of the function until the given address. The code might be modified by another Kprobe. In this case, the current code is read into a buffer, int3 breakpoint is replaced by the saved opcode in the buffer, and can_probe() analyzes the buffer instead. There is a bug that __recover_probed_insn() tries to restore the original code even for Kprobes using the ftrace framework. But in this case, the opcode is not stored. See the difference between arch_prepare_kprobe() and arch_prepare_kprobe_ftrace(). The opcode is stored by arch_copy_kprobe() only from arch_prepare_kprobe(). This patch makes Kprobe to use the ideal 5-byte NOP when the code can be modified by ftrace. It is the original instruction, see ftrace_make_nop() and ftrace_nop_replace(). Note that we always need to use the NOP for ftrace locations. Kprobes do not block ftrace and the instruction might get modified at anytime. It might even be in an inconsistent state because it is modified step by step using the int3 breakpoint. The patch also fixes indentation of the touched comment. Note that I found this problem when playing with Kprobes. I did it on x86_64 with gcc-4.8.3 that supported -mfentry. I modified samples/kprobes/kprobe_example.c and added offset 5 to put the probe right after the fentry area: static struct kprobe kp = { .symbol_name = "do_fork", + .offset = 5, }; Then I was able to load kprobe_example before jprobe_example but not the other way around: $> modprobe jprobe_example $> modprobe kprobe_example modprobe: ERROR: could not insert 'kprobe_example': Invalid or incomplete multibyte or wide character It did not make much sense and debugging pointed to the bug described above. Signed-off-by: Petr Mladek <pmladek@suse.cz> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Ananth NMavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1424441250-27146-2-git-send-email-pmladek@suse.cz Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-18kprobes/x86: Mark 2 bytes NOP as boostableWang Nan
Currently, x86 kprobes is unable to boost 2 bytes nop like: nopl 0x0(%rax,%rax,1) which is 0x0f 0x1f 0x44 0x00 0x00. Such nops have exactly 5 bytes to hold a relative jmp instruction. Boosting them should be obviously safe. This patch enable boosting such nops by simply updating twobyte_is_boostable[] array. Signed-off-by: Wang Nan <wangnan0@huawei.com> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: <lizefan@huawei.com> Link: http://lkml.kernel.org/r/1423532045-41049-1-git-send-email-wangnan0@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-15ftrace/jprobes/x86: Fix conflict between jprobes and function graph tracingSteven Rostedt (Red Hat)
If the function graph tracer traces a jprobe callback, the system will crash. This can easily be demonstrated by compiling the jprobe sample module that is in the kernel tree, loading it and running the function graph tracer. # modprobe jprobe_example.ko # echo function_graph > /sys/kernel/debug/tracing/current_tracer # ls The first two commands end up in a nice crash after the first fork. (do_fork has a jprobe attached to it, so "ls" just triggers that fork) The problem is caused by the jprobe_return() that all jprobe callbacks must end with. The way jprobes works is that the function a jprobe is attached to has a breakpoint placed at the start of it (or it uses ftrace if fentry is supported). The breakpoint handler (or ftrace callback) will copy the stack frame and change the ip address to return to the jprobe handler instead of the function. The jprobe handler must end with jprobe_return() which swaps the stack and does an int3 (breakpoint). This breakpoint handler will then put back the saved stack frame, simulate the instruction at the beginning of the function it added a breakpoint to, and then continue on. For function tracing to work, it hijakes the return address from the stack frame, and replaces it with a hook function that will trace the end of the call. This hook function will restore the return address of the function call. If the function tracer traces the jprobe handler, the hook function for that handler will not be called, and its saved return address will be used for the next function. This will result in a kernel crash. To solve this, pause function tracing before the jprobe handler is called and unpause it before it returns back to the function it probed. Some other updates: Used a variable "saved_sp" to hold kcb->jprobe_saved_sp. This makes the code look a bit cleaner and easier to understand (various tries to fix this bug required this change). Note, if fentry is being used, jprobes will change the ip address before the function graph tracer runs and it will not be able to trace the function that the jprobe is probing. Link: http://lkml.kernel.org/r/20150114154329.552437962@goodmis.org Cc: stable@vger.kernel.org # 2.6.30+ Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-17x86: Remove arbitrary instruction size limit in instruction decoderDave Hansen
The current x86 instruction decoder steps along through the instruction stream but always ensures that it never steps farther than the largest possible instruction size (MAX_INSN_SIZE). The MPX code is now going to be doing some decoding of userspace instructions. We copy those from userspace in to the kernel and they're obviously completely untrusted coming from userspace. In addition to the constraint that instructions can only be so long, we also have to be aware of how long the buffer is that came in from userspace. This _looks_ to be similar to what the perf and kprobes is doing, but it's unclear to me whether they are affected. The whole reason we need this is that it is perfectly valid to be executing an instruction within MAX_INSN_SIZE bytes of an unreadable page. We should be able to gracefully handle short reads in those cases. This adds support to the decoder to record how long the buffer being decoded is and to refuse to "validate" the instruction if we would have gone over the end of the buffer to decode it. The kprobes code probably needs to be looked at here a bit more carefully. This patch still respects the MAX_INSN_SIZE limit there but the kprobes code does look like it might be able to be a bit more strict than it currently is. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Jim Keniston <jkenisto@us.ibm.com> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: x86@kernel.org Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Link: http://lkml.kernel.org/r/20141114153957.E6B01535@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-07-16kprobes/x86: Don't try to resolve kprobe faults from userspaceAndy Lutomirski
This commit: commit 6f6343f53d133bae516caf3d254bce37d8774625 Author: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Date: Thu Apr 17 17:17:33 2014 +0900 kprobes/x86: Call exception handlers directly from do_int3/do_debug appears to have inadvertently dropped a check that the int3 came from kernel mode. Trying to dereference addr when addr is user-controlled is completely bogus. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Link: http://lkml.kernel.org/r/c4e339882c121aa76254f2adde3fcbdf502faec2.1405099506.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-24kprobes, x86: Use NOKPROBE_SYMBOL() instead of __kprobes annotationMasami Hiramatsu
Use NOKPROBE_SYMBOL macro for protecting functions from kprobes instead of __kprobes annotation under arch/x86. This applies nokprobe_inline annotation for some cases, because NOKPROBE_SYMBOL() will inhibit inlining by referring the symbol address. This just folds a bunch of previous NOKPROBE_SYMBOL() cleanup patches for x86 to one patch. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Link: http://lkml.kernel.org/r/20140417081814.26341.51656.stgit@ltc230.yrl.intra.hitachi.co.jp Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fernando Luis Vázquez Cao <fernando_b1@lab.ntt.co.jp> Cc: Gleb Natapov <gleb@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Lebon <jlebon@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Matt Fleming <matt.fleming@intel.com> Cc: Michel Lespinasse <walken@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-24kprobes/x86: Allow probe on some kprobe preparation functionsMasami Hiramatsu
There is no need to prohibit probing on the functions used in preparation phase. Those are safely probed because those are not invoked from breakpoint/fault/debug handlers, there is no chance to cause recursive exceptions. Following functions are now removed from the kprobes blacklist: can_boost can_probe can_optimize is_IF_modifier __copy_instruction copy_optimized_instructions arch_copy_kprobe arch_prepare_kprobe arch_arm_kprobe arch_disarm_kprobe arch_remove_kprobe arch_trampoline_kprobe arch_prepare_kprobe_ftrace arch_prepare_optimized_kprobe arch_check_optimized_kprobe arch_within_optimized_kprobe __arch_remove_optimized_kprobe arch_remove_optimized_kprobe arch_optimize_kprobes arch_unoptimize_kprobe I tested those functions by putting kprobes on all instructions in the functions with the bash script I sent to LKML. See: https://lkml.org/lkml/2014/3/27/33 Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Jonathan Lebon <jlebon@redhat.com> Link: http://lkml.kernel.org/r/20140417081747.26341.36065.stgit@ltc230.yrl.intra.hitachi.co.jp Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-24kprobes/x86: Call exception handlers directly from do_int3/do_debugMasami Hiramatsu
To avoid a kernel crash by probing on lockdep code, call kprobe_int3_handler() and kprobe_debug_handler()(which was formerly called post_kprobe_handler()) directly from do_int3 and do_debug. Currently kprobes uses notify_die() to hook the int3/debug exceptoins. Since there is a locking code in notify_die, the lockdep code can be invoked. And because the lockdep involves printk() related things, theoretically, we need to prohibit probing on such code, which means much longer blacklist we'll have. Instead, hooking the int3/debug for kprobes before notify_die() can avoid this problem. Anyway, most of the int3 handlers in the kernel are already called from do_int3 directly, e.g. ftrace_int3_handler, poke_int3_handler, kgdb_ll_trap. Actually only kprobe_exceptions_notify is on the notifier_call_chain. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@suse.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Jonathan Lebon <jlebon@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Link: http://lkml.kernel.org/r/20140417081733.26341.24423.stgit@ltc230.yrl.intra.hitachi.co.jp Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-24kprobes: Prohibit probing on .entry.text codeMasami Hiramatsu
.entry.text is a code area which is used for interrupt/syscall entries, which includes many sensitive code. Thus, it is better to prohibit probing on all of such code instead of a part of that. Since some symbols are already registered on kprobe blacklist, this also removes them from the blacklist. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David S. Miller <davem@davemloft.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jan Kiszka <jan.kiszka@siemens.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Jonathan Lebon <jlebon@redhat.com> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Link: http://lkml.kernel.org/r/20140417081658.26341.57354.stgit@ltc230.yrl.intra.hitachi.co.jp Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-24kprobes/x86: Allow to handle reentered kprobe on single-steppingMasami Hiramatsu
Since the NMI handlers(e.g. perf) can interrupt in the single stepping (or preparing the single stepping, do_debug etc.), we should consider a kprobe is hit in the NMI handler. Even in that case, the kprobe is allowed to be reentered as same as the kprobes hit in kprobe handlers (KPROBE_HIT_ACTIVE or KPROBE_HIT_SSDONE). The real issue will happen when a kprobe hit while another reentered kprobe is processing (KPROBE_REENTER), because we already consumed a saved-area for the previous kprobe. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Jonathan Lebon <jlebon@redhat.com> Link: http://lkml.kernel.org/r/20140417081651.26341.10593.stgit@ltc230.yrl.intra.hitachi.co.jp Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-17kprobes/x86: Fix page-fault handling logicMasami Hiramatsu
Current kprobes in-kernel page fault handler doesn't expect that its single-stepping can be interrupted by an NMI handler which may cause a page fault(e.g. perf with callback tracing). In that case, the page-fault handled by kprobes and it misunderstands the page-fault has been caused by the single-stepping code and tries to recover IP address to probed address. But the truth is the page-fault has been caused by the NMI handler, and do_page_fault failes to handle real page fault because the IP address is modified and causes Kernel BUGs like below. ---- [ 2264.726905] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 [ 2264.727190] IP: [<ffffffff813c46e0>] copy_user_generic_string+0x0/0x40 To handle this correctly, I fixed the kprobes fault handler to ensure the faulted ip address is its own single-step buffer instead of checking current kprobe state. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Sandeepa Prabhu <sandeepa.prabhu@linaro.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: fche@redhat.com Cc: systemtap@sourceware.org Link: http://lkml.kernel.org/r/20140417081644.26341.52351.stgit@ltc230.yrl.intra.hitachi.co.jp Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-04Merge branch 'x86-asmlinkage-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/asmlinkage changes from Ingo Molnar: "As a preparation for Andi Kleen's LTO patchset (link time optimizations using GCC's -flto which build time optimization has steadily increased in quality over the past few years and might eventually be usable for the kernel too) this tree includes a handful of preparatory patches that make function calling convention annotations consistent again: - Mark every function without arguments (or 64bit only) that is used by assembly code with asmlinkage() - Mark every function with parameters or variables that is used by assembly code as __visible. For the vanilla kernel this has documentation, consistency and debuggability advantages, for the time being" * 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/asmlinkage: Fix warning in xen asmlinkage change x86, asmlinkage, vdso: Mark vdso variables __visible x86, asmlinkage, power: Make various symbols used by the suspend asm code visible x86, asmlinkage: Make dump_stack visible x86, asmlinkage: Make 64bit checksum functions visible x86, asmlinkage, paravirt: Add __visible/asmlinkage to xen paravirt ops x86, asmlinkage, apm: Make APM data structure used from assembler visible x86, asmlinkage: Make syscall tables visible x86, asmlinkage: Make several variables used from assembler/linker script visible x86, asmlinkage: Make kprobes code visible and fix assembler code x86, asmlinkage: Make various syscalls asmlinkage x86, asmlinkage: Make 32bit/64bit __switch_to visible x86, asmlinkage: Make _*_start_kernel visible x86, asmlinkage: Make all interrupt handlers asmlinkage / __visible x86, asmlinkage: Change dotraplinkage into __visible on 32bit x86: Fix sys_call_table type in asm/syscall.h
2013-08-06x86, asmlinkage: Make kprobes code visible and fix assembler codeAndi Kleen
- Make all the external assembler template symbols __visible - Move the templates inline assembler code into a top level assembler statement, not inside a function. This avoids it being optimized away or cloned. Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1375740170-7446-8-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-07-19kprobes/x86: Use text_poke_bp() instead of text_poke_smp*()Masami Hiramatsu
Use text_poke_bp() for optimizing kprobes instead of text_poke_smp*(). Since the number of kprobes is usually not so large (<100) and text_poke_bp() is much lighter than text_poke_smp() [which uses stop_machine()], this just stops using batch processing. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Jiri Kosina <jkosina@suse.cz> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Jason Baron <jbaron@akamai.com> Cc: yrl.pp-manager.tt@hitachi.com Cc: Borislav Petkov <bpetkov@suse.de> Link: http://lkml.kernel.org/r/20130718114750.26675.9174.stgit@mhiramat-M0-7522 Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-20kprobes: Fix arch_prepare_kprobe to handle copy insn failuresMasami Hiramatsu
Fix arch_prepare_kprobe() to handle failures in copy instruction correctly. This fix is related to the previous fix: 8101376 which made __copy_instruction return an error result if failed, but caller site was not updated to handle it. Thus, this is the other half of the bugfix. This fix is also related to the following bug-report: https://bugzilla.redhat.com/show_bug.cgi?id=910649 Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Tested-by: Jonathan Lebon <jlebon@redhat.com> Cc: Frank Ch. Eigler <fche@redhat.com> Cc: systemtap@sourceware.org Cc: yrl.pp-manager.tt@hitachi.com Link: http://lkml.kernel.org/r/20130605031216.15285.2001.stgit@mhiramat-M0-7522 Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-08kprobes/x86: Just return error for sanity check failure instead of using BUG_ONMasami Hiramatsu
Return an error from __copy_instruction() and use printk() to give us a more productive message, since this is just an error case which we can handle and also the BUG_ON() never tells us why and what happened. This is related to the following bug-report: https://bugzilla.redhat.com/show_bug.cgi?id=910649 Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Frank Ch. Eigler <fche@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: yrl.pp-manager.tt@hitachi.com Link: http://lkml.kernel.org/r/20130404104230.22862.85242.stgit@mhiramat-M0-7522 Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-03-18kprobes/x86: Check Interrupt Flag modifier when registering probeMasami Hiramatsu
Currently kprobes check whether the copied instruction modifies IF (interrupt flag) on each probe hit. This results not only in introducing overhead but also involving inat_get_opcode_attribute into the kprobes hot path, and it can cause an infinite recursive call (and kernel panic in the end). Actually, since the copied instruction itself can never be modified on the buffer, it is needless to analyze the instruction on every probe hit. To fix this issue, we check it only once when registering probe and store the result on ainsn->if_modifier. Reported-by: Timo Juhani Lindfors <timo.lindfors@iki.fi> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: yrl.pp-manager.tt@hitachi.com Cc: Steven Rostedt <rostedt@goodmis.org> Cc: David S. Miller <davem@davemloft.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20130314115242.19690.33573.stgit@mhiramat-M0-7522 Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-02-28hlist: drop the node parameter from iteratorsSasha Levin
I'm not sure why, but the hlist for each entry iterators were conceived list_for_each_entry(pos, head, member) The hlist ones were greedy and wanted an extra parameter: hlist_for_each_entry(tpos, pos, head, member) Why did they need an extra pos parameter? I'm not quite sure. Not only they don't really need it, it also prevents the iterator from looking exactly like the list iterator, which is unfortunate. Besides the semantic patch, there was some manual work required: - Fix up the actual hlist iterators in linux/list.h - Fix up the declaration of other iterators based on the hlist ones. - A very small amount of places were using the 'node' parameter, this was modified to use 'obj->member' instead. - Coccinelle didn't handle the hlist_for_each_entry_safe iterator properly, so those had to be fixed up manually. The semantic patch which is mostly the work of Peter Senna Tschudin is here: @@ iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host; type T; expression a,c,d,e; identifier b; statement S; @@ -T b; <+... when != b ( hlist_for_each_entry(a, - b, c, d) S | hlist_for_each_entry_continue(a, - b, c) S | hlist_for_each_entry_from(a, - b, c) S | hlist_for_each_entry_rcu(a, - b, c, d) S | hlist_for_each_entry_rcu_bh(a, - b, c, d) S | hlist_for_each_entry_continue_rcu_bh(a, - b, c) S | for_each_busy_worker(a, c, - b, d) S | ax25_uid_for_each(a, - b, c) S | ax25_for_each(a, - b, c) S | inet_bind_bucket_for_each(a, - b, c) S | sctp_for_each_hentry(a, - b, c) S | sk_for_each(a, - b, c) S | sk_for_each_rcu(a, - b, c) S | sk_for_each_from -(a, b) +(a) S + sk_for_each_from(a) S | sk_for_each_safe(a, - b, c, d) S | sk_for_each_bound(a, - b, c) S | hlist_for_each_entry_safe(a, - b, c, d, e) S | hlist_for_each_entry_continue_rcu(a, - b, c) S | nr_neigh_for_each(a, - b, c) S | nr_neigh_for_each_safe(a, - b, c, d) S | nr_node_for_each(a, - b, c) S | nr_node_for_each_safe(a, - b, c, d) S | - for_each_gfn_sp(a, c, d, b) S + for_each_gfn_sp(a, c, d) S | - for_each_gfn_indirect_valid_sp(a, c, d, b) S + for_each_gfn_indirect_valid_sp(a, c, d) S | for_each_host(a, - b, c) S | for_each_host_safe(a, - b, c, d) S | for_each_mesh_entry(a, - b, c, d) S ) ...+> [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c] [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c] [akpm@linux-foundation.org: checkpatch fixes] [akpm@linux-foundation.org: fix warnings] [akpm@linux-foudnation.org: redo intrusive kvm changes] Tested-by: Peter Senna Tschudin <peter.senna@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-21kprobes/x86: Move kprobes stuff under arch/x86/kernel/kprobes/Masami Hiramatsu
Move arch-dep kprobes stuff under arch/x86/kernel/kprobes. Link: http://lkml.kernel.org/r/20120928081522.3560.75469.stgit@ltc138.sdl.hitachi.co.jp Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> [ fixed whitespace and s/__attribute__((packed))/__packed/ ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>