summaryrefslogtreecommitdiff
path: root/kernel/trace
AgeCommit message (Collapse)Author
2017-04-21ftrace: Fix function pid filter on instancesNamhyung Kim
commit d879d0b8c183aabeb9a65eba91f3f9e3c7e7b905 upstream. When function tracer has a pid filter, it adds a probe to sched_switch to track if current task can be ignored. The probe checks the ftrace_ignore_pid from current tr to filter tasks. But it misses to delete the probe when removing an instance so that it can cause a crash due to the invalid tr pointer (use-after-free). This is easily reproducible with the following: # cd /sys/kernel/debug/tracing # mkdir instances/buggy # echo $$ > instances/buggy/set_ftrace_pid # rmdir instances/buggy ============================================================================ BUG: KASAN: use-after-free in ftrace_filter_pid_sched_switch_probe+0x3d/0x90 Read of size 8 by task kworker/0:1/17 CPU: 0 PID: 17 Comm: kworker/0:1 Tainted: G B 4.11.0-rc3 #198 Call Trace: dump_stack+0x68/0x9f kasan_object_err+0x21/0x70 kasan_report.part.1+0x22b/0x500 ? ftrace_filter_pid_sched_switch_probe+0x3d/0x90 kasan_report+0x25/0x30 __asan_load8+0x5e/0x70 ftrace_filter_pid_sched_switch_probe+0x3d/0x90 ? fpid_start+0x130/0x130 __schedule+0x571/0xce0 ... To fix it, use ftrace_clear_pids() to unregister the probe. As instance_rmdir() already updated ftrace codes, it can just free the filter safely. Link: http://lkml.kernel.org/r/20170417024430.21194-2-namhyung@kernel.org Fixes: 0c8916c34203 ("tracing: Add rmdir to remove multibuffer instances") Cc: Ingo Molnar <mingo@kernel.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-21ftrace: Fix removing of second function probeSteven Rostedt (VMware)
commit 82cc4fc2e70ec5baeff8f776f2773abc8b2cc0ae upstream. When two function probes are added to set_ftrace_filter, and then one of them is removed, the update to the function locations is not performed, and the record keeping of the function states are corrupted, and causes an ftrace_bug() to occur. This is easily reproducable by adding two probes, removing one, and then adding it back again. # cd /sys/kernel/debug/tracing # echo schedule:traceoff > set_ftrace_filter # echo do_IRQ:traceoff > set_ftrace_filter # echo \!do_IRQ:traceoff > /debug/tracing/set_ftrace_filter # echo do_IRQ:traceoff > set_ftrace_filter Causes: ------------[ cut here ]------------ WARNING: CPU: 2 PID: 1098 at kernel/trace/ftrace.c:2369 ftrace_get_addr_curr+0x143/0x220 Modules linked in: [...] CPU: 2 PID: 1098 Comm: bash Not tainted 4.10.0-test+ #405 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012 Call Trace: dump_stack+0x68/0x9f __warn+0x111/0x130 ? trace_irq_work_interrupt+0xa0/0xa0 warn_slowpath_null+0x1d/0x20 ftrace_get_addr_curr+0x143/0x220 ? __fentry__+0x10/0x10 ftrace_replace_code+0xe3/0x4f0 ? ftrace_int3_handler+0x90/0x90 ? printk+0x99/0xb5 ? 0xffffffff81000000 ftrace_modify_all_code+0x97/0x110 arch_ftrace_update_code+0x10/0x20 ftrace_run_update_code+0x1c/0x60 ftrace_run_modify_code.isra.48.constprop.62+0x8e/0xd0 register_ftrace_function_probe+0x4b6/0x590 ? ftrace_startup+0x310/0x310 ? debug_lockdep_rcu_enabled.part.4+0x1a/0x30 ? update_stack_state+0x88/0x110 ? ftrace_regex_write.isra.43.part.44+0x1d3/0x320 ? preempt_count_sub+0x18/0xd0 ? mutex_lock_nested+0x104/0x800 ? ftrace_regex_write.isra.43.part.44+0x1d3/0x320 ? __unwind_start+0x1c0/0x1c0 ? _mutex_lock_nest_lock+0x800/0x800 ftrace_trace_probe_callback.isra.3+0xc0/0x130 ? func_set_flag+0xe0/0xe0 ? __lock_acquire+0x642/0x1790 ? __might_fault+0x1e/0x20 ? trace_get_user+0x398/0x470 ? strcmp+0x35/0x60 ftrace_trace_onoff_callback+0x48/0x70 ftrace_regex_write.isra.43.part.44+0x251/0x320 ? match_records+0x420/0x420 ftrace_filter_write+0x2b/0x30 __vfs_write+0xd7/0x330 ? do_loop_readv_writev+0x120/0x120 ? locks_remove_posix+0x90/0x2f0 ? do_lock_file_wait+0x160/0x160 ? __lock_is_held+0x93/0x100 ? rcu_read_lock_sched_held+0x5c/0xb0 ? preempt_count_sub+0x18/0xd0 ? __sb_start_write+0x10a/0x230 ? vfs_write+0x222/0x240 vfs_write+0xef/0x240 SyS_write+0xab/0x130 ? SyS_read+0x130/0x130 ? trace_hardirqs_on_caller+0x182/0x280 ? trace_hardirqs_on_thunk+0x1a/0x1c entry_SYSCALL_64_fastpath+0x18/0xad RIP: 0033:0x7fe61c157c30 RSP: 002b:00007ffe87890258 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: ffffffff8114a410 RCX: 00007fe61c157c30 RDX: 0000000000000010 RSI: 000055814798f5e0 RDI: 0000000000000001 RBP: ffff8800c9027f98 R08: 00007fe61c422740 R09: 00007fe61ca53700 R10: 0000000000000073 R11: 0000000000000246 R12: 0000558147a36400 R13: 00007ffe8788f160 R14: 0000000000000024 R15: 00007ffe8788f15c ? trace_hardirqs_off_caller+0xc0/0x110 ---[ end trace 99fa09b3d9869c2c ]--- Bad trampoline accounting at: ffffffff81cc3b00 (do_IRQ+0x0/0x150) Fixes: 59df055f1991 ("ftrace: trace different functions with a different tracer") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-12ring-buffer: Fix return value check in test_ringbuffer()Wei Yongjun
commit 62277de758b155dc04b78f195a1cb5208c37b2df upstream. In case of error, the function kthread_run() returns ERR_PTR() and never returns NULL. The NULL test in the return value check should be replaced with IS_ERR(). Link: http://lkml.kernel.org/r/1466184839-14927-1-git-send-email-weiyj_lk@163.com Fixes: 6c43e554a ("ring-buffer: Add ring buffer startup selftest") Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15fs: Better permission checking for submountsEric W. Biederman
commit 93faccbbfa958a9668d3ab4e30f38dd205cee8d8 upstream. To support unprivileged users mounting filesystems two permission checks have to be performed: a test to see if the user allowed to create a mount in the mount namespace, and a test to see if the user is allowed to access the specified filesystem. The automount case is special in that mounting the original filesystem grants permission to mount the sub-filesystems, to any user who happens to stumble across the their mountpoint and satisfies the ordinary filesystem permission checks. Attempting to handle the automount case by using override_creds almost works. It preserves the idea that permission to mount the original filesystem is permission to mount the sub-filesystem. Unfortunately using override_creds messes up the filesystems ordinary permission checks. Solve this by being explicit that a mount is a submount by introducing vfs_submount, and using it where appropriate. vfs_submount uses a new mount internal mount flags MS_SUBMOUNT, to let sget and friends know that a mount is a submount so they can take appropriate action. sget and sget_userns are modified to not perform any permission checks on submounts. follow_automount is modified to stop using override_creds as that has proven problemantic. do_mount is modified to always remove the new MS_SUBMOUNT flag so that we know userspace will never by able to specify it. autofs4 is modified to stop using current_real_cred that was put in there to handle the previous version of submount permission checking. cifs is modified to pass the mountpoint all of the way down to vfs_submount. debugfs is modified to pass the mountpoint all of the way down to trace_automount by adding a new parameter. To make this change easier a new typedef debugfs_automount_t is introduced to capture the type of the debugfs automount function. Fixes: 069d5ac9ae0d ("autofs: Fix automounts by using current_real_cred()->uid") Fixes: aeaa4a79ff6a ("fs: Call d_automount with the filesystems creds") Reviewed-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-09tracing: Fix hwlat kthread migrationSteven Rostedt (VMware)
commit 79c6f448c8b79c321e4a1f31f98194e4f6b6cae7 upstream. The hwlat tracer creates a kernel thread at start of the tracer. It is pinned to a single CPU and will move to the next CPU after each period of running. If the user modifies the migration thread's affinity, it will not change after that happens. The original code created the thread at the first instance it was called, but later was changed to destroy the thread after the tracer was finished, and would not be created until the next instance of the tracer was established. The code that initialized the affinity was only called on the initial instantiation of the tracer. After that, it was not initialized, and the previous affinity did not match the current newly created one, making it appear that the user modified the thread's affinity when it did not, and the thread failed to migrate again. Fixes: 0330f7aa8ee6 ("tracing: Have hwlat trace migrate across tracing_cpumask CPUs") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09fgraph: Handle a case where a tracer ignores set_graph_notraceSteven Rostedt (Red Hat)
commit 794de08a16cf1fc1bf785dc48f66d36218cf6d88 upstream. Both the wakeup and irqsoff tracers can use the function graph tracer when the display-graph option is set. The problem is that they ignore the notrace file, and record the entry of functions that would be ignored by the function_graph tracer. This causes the trace->depth to be recorded into the ring buffer. The set_graph_notrace uses a trick by adding a large negative number to the trace->depth when a graph function is to be ignored. On trace output, the graph function uses the depth to record a stack of functions. But since the depth is negative, it accesses the array with a negative number and causes an out of bounds access that can cause a kernel oops or corrupt data. Have the print functions handle cases where a tracer still records functions even when they are in set_graph_notrace. Also add warnings if the depth is below zero before accessing the array. Note, the function graph logic will still prevent the return of these functions from being recorded, which means that they will be left hanging without a return. For example: # echo '*spin*' > set_graph_notrace # echo 1 > options/display-graph # echo wakeup > current_tracer # cat trace [...] _raw_spin_lock() { preempt_count_add() { do_raw_spin_lock() { update_rq_clock(); Where it should look like: _raw_spin_lock() { preempt_count_add(); do_raw_spin_lock(); } update_rq_clock(); Cc: Namhyung Kim <namhyung.kim@lge.com> Fixes: 29ad23b00474 ("ftrace: Add set_graph_notrace filter") Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-14ftrace: Add more checks for FTRACE_FL_DISABLED in processing ip recordsSteven Rostedt (Red Hat)
When a module is first loaded and its function ip records are added to the ftrace list of functions to modify, they are set to DISABLED, as their text is still in a read only state. When the module is fully loaded, and can be updated, the flag is cleared, and if their's any functions that should be tracing them, it is updated at that moment. But there's several locations that do record accounting and should ignore records that are marked as disabled, or they can cause issues. Alexei already fixed one location, but others need to be addressed. Cc: stable@vger.kernel.org Fixes: b7ffffbb46f2 "ftrace: Add infrastructure for delayed enabling of module functions" Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-11-14ftrace: Ignore FTRACE_FL_DISABLED while walking dyn_ftrace recordsAlexei Starovoitov
ftrace_shutdown() checks for sanity of ftrace records and if dyn_ftrace->flags is not zero, it will warn. It can happen that 'flags' are set to FTRACE_FL_DISABLED at this point, since some module was loaded, but before ftrace_module_enable() cleared the flags for this module. In other words the module.c is doing: ftrace_module_init(mod); // calls ftrace_update_code() that sets flags=FTRACE_FL_DISABLED ... // here ftrace_shutdown() is called that warns, since err = prepare_coming_module(mod); // didn't have a chance to clear FTRACE_FL_DISABLED Fix it by ignoring disabled records. It's similar to what __ftrace_hash_rec_update() is already doing. Link: http://lkml.kernel.org/r/1478560460-3818619-1-git-send-email-ast@fb.com Cc: stable@vger.kernel.org Fixes: b7ffffbb46f2 "ftrace: Add infrastructure for delayed enabling of module functions" Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-10-12Disable the __builtin_return_address() warning globally after allLinus Torvalds
This affectively reverts commit 377ccbb48373 ("Makefile: Mute warning for __builtin_return_address(>0) for tracing only") because it turns out that it really isn't tracing only - it's all over the tree. We already also had the warning disabled separately for mm/usercopy.c (which this commit also removes), and it turns out that we will also want to disable it for get_lock_parent_ip(), that is used for at least TRACE_IRQFLAGS. Which (when enabled) ends up being all over the tree. Steven Rostedt had a patch that tried to limit it to just the config options that actually triggered this, but quite frankly, the extra complexity and abstraction just isn't worth it. We have never actually had a case where the warning is actually useful, so let's just disable it globally and not worry about it. Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Anvin <hpa@zytor.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-07Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial updates from Jiri Kosina: "The usual rocket science from the trivial tree" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: tracing/syscalls: fix multiline in error message text lib/Kconfig.debug: fix DEBUG_SECTION_MISMATCH description doc: vfs: fix fadvise() sycall name x86/entry: spell EBX register correctly in documentation securityfs: fix securityfs_create_dir comment irq: Fix typo in tracepoint.xml
2016-10-06Merge tag 'trace-v4.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "This release cycle is rather small. Just a few fixes to tracing. The big change is the addition of the hwlat tracer. It not only detects SMIs, but also other latency that's caused by the hardware. I have detected some latency from large boxes having bus contention" * tag 'trace-v4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Call traceoff trigger after event is recorded ftrace/scripts: Add helper script to bisect function tracing problem functions tracing: Have max_latency be defined for HWLAT_TRACER as well tracing: Add NMI tracing in hwlat detector tracing: Have hwlat trace migrate across tracing_cpumask CPUs tracing: Add documentation for hwlat_detector tracer tracing: Added hardware latency tracer ftrace: Access ret_stack->subtime only in the function profiler function_graph: Handle TRACE_BPUTS in print_graph_comment tracing/uprobe: Drop isdigit() check in create_trace_uprobe
2016-10-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: 1) BBR TCP congestion control, from Neal Cardwell, Yuchung Cheng and co. at Google. https://lwn.net/Articles/701165/ 2) Do TCP Small Queues for retransmits, from Eric Dumazet. 3) Support collect_md mode for all IPV4 and IPV6 tunnels, from Alexei Starovoitov. 4) Allow cls_flower to classify packets in ip tunnels, from Amir Vadai. 5) Support DSA tagging in older mv88e6xxx switches, from Andrew Lunn. 6) Support GMAC protocol in iwlwifi mwm, from Ayala Beker. 7) Support ndo_poll_controller in mlx5, from Calvin Owens. 8) Move VRF processing to an output hook and allow l3mdev to be loopback, from David Ahern. 9) Support SOCK_DESTROY for UDP sockets. Also from David Ahern. 10) Congestion control in RXRPC, from David Howells. 11) Support geneve RX offload in ixgbe, from Emil Tantilov. 12) When hitting pressure for new incoming TCP data SKBs, perform a partial rathern than a full purge of the OFO queue (which could be huge). From Eric Dumazet. 13) Convert XFRM state and policy lookups to RCU, from Florian Westphal. 14) Support RX network flow classification to igb, from Gangfeng Huang. 15) Hardware offloading of eBPF in nfp driver, from Jakub Kicinski. 16) New skbmod packet action, from Jamal Hadi Salim. 17) Remove some inefficiencies in snmp proc output, from Jia He. 18) Add FIB notifications to properly propagate route changes to hardware which is doing forwarding offloading. From Jiri Pirko. 19) New dsa driver for qca8xxx chips, from John Crispin. 20) Implement RFC7559 ipv6 router solicitation backoff, from Maciej Żenczykowski. 21) Add L3 mode to ipvlan, from Mahesh Bandewar. 22) Support 802.1ad in mlx4, from Moshe Shemesh. 23) Support hardware LRO in mediatek driver, from Nelson Chang. 24) Add TC offloading to mlx5, from Or Gerlitz. 25) Convert various drivers to ethtool ksettings interfaces, from Philippe Reynes. 26) TX max rate limiting for cxgb4, from Rahul Lakkireddy. 27) NAPI support for ath10k, from Rajkumar Manoharan. 28) Support XDP in mlx5, from Rana Shahout and Saeed Mahameed. 29) UDP replicast support in TIPC, from Richard Alpe. 30) Per-queue statistics for qed driver, from Sudarsana Reddy Kalluru. 31) Support BQL in thunderx driver, from Sunil Goutham. 32) TSO support in alx driver, from Tobias Regnery. 33) Add stream parser engine and use it in kcm. 34) Support async DHCP replies in ipconfig module, from Uwe Kleine-König. 35) DSA port fast aging for mv88e6xxx driver, from Vivien Didelot. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1715 commits) mlxsw: switchx2: Fix misuse of hard_header_len mlxsw: spectrum: Fix misuse of hard_header_len net/faraday: Stop NCSI device on shutdown net/ncsi: Introduce ncsi_stop_dev() net/ncsi: Rework the channel monitoring net/ncsi: Allow to extend NCSI request properties net/ncsi: Rework request index allocation net/ncsi: Don't probe on the reserved channel ID (0x1f) net/ncsi: Introduce NCSI_RESERVED_CHANNEL net/ncsi: Avoid unused-value build warning from ia64-linux-gcc net: Add netdev all_adj_list refcnt propagation to fix panic net: phy: Add Edge-rate driver for Microsemi PHYs. vmxnet3: Wake queue from reset work i40e: avoid NULL pointer dereference and recursive errors on early PCI error qed: Add RoCE ll2 & GSI support qed: Add support for memory registeration verbs qed: Add support for QP verbs qed: PD,PKEY and CQ verb support qed: Add support for RoCE hw init qede: Add qedr framework ...
2016-10-03Merge branch 'x86-asm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull low-level x86 updates from Ingo Molnar: "In this cycle this topic tree has become one of those 'super topics' that accumulated a lot of changes: - Add CONFIG_VMAP_STACK=y support to the core kernel and enable it on x86 - preceded by an array of changes. v4.8 saw preparatory changes in this area already - this is the rest of the work. Includes the thread stack caching performance optimization. (Andy Lutomirski) - switch_to() cleanups and all around enhancements. (Brian Gerst) - A large number of dumpstack infrastructure enhancements and an unwinder abstraction. The secret long term plan is safe(r) live patching plus maybe another attempt at debuginfo based unwinding - but all these current bits are standalone enhancements in a frame pointer based debug environment as well. (Josh Poimboeuf) - More __ro_after_init and const annotations. (Kees Cook) - Enable KASLR for the vmemmap memory region. (Thomas Garnier)" [ The virtually mapped stack changes are pretty fundamental, and not x86-specific per se, even if they are only used on x86 right now. ] * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits) x86/asm: Get rid of __read_cr4_safe() thread_info: Use unsigned long for flags x86/alternatives: Add stack frame dependency to alternative_call_2() x86/dumpstack: Fix show_stack() task pointer regression x86/dumpstack: Remove dump_trace() and related callbacks x86/dumpstack: Convert show_trace_log_lvl() to use the new unwinder oprofile/x86: Convert x86_backtrace() to use the new unwinder x86/stacktrace: Convert save_stack_trace_*() to use the new unwinder perf/x86: Convert perf_callchain_kernel() to use the new unwinder x86/unwind: Add new unwind interface and implementations x86/dumpstack: Remove NULL task pointer convention fork: Optimize task creation by caching two thread stacks per CPU if CONFIG_VMAP_STACK=y sched/core: Free the stack early if CONFIG_THREAD_INFO_IN_TASK lib/syscall: Pin the task stack in collect_syscall() x86/process: Pin the target stack in get_wchan() x86/dumpstack: Pin the target stack when dumping it kthread: Pin the stack via try_get_task_stack()/put_task_stack() in to_live_kthread() function sched/core: Add try_get_task_stack() and put_task_stack() x86/entry/64: Fix a minor comment rebase error iommu/amd: Don't put completion-wait semaphore on stack ...
2016-10-03Merge branch 'perf-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "The main kernel side changes were: - uprobes enhancements (Masami Hiramatsu) - Uncore group events enhancements (David Carrillo-Cisneros) - x86 Intel: Add support for Skylake server uncore PMUs (Kan Liang) - x86 Intel: LBR cleanups and enhancements, for better branch annotation tracking (Peter Zijlstra) - x86 Intel: Add support for PTWRITE and power event tracing (Alexander Shishkin) - ... various fixes, cleanups and smaller enhancements. Lots of tooling changes - a couple of highlights: - Support event group view with hierarchy mode in 'perf top' and 'perf report' (Namhyung Kim) e.g.: $ perf record -e '{cycles,instructions}' make $ perf report --hierarchy --stdio ... # Overhead Command / Shared Object / Symbol # ...................... .................................. ... 25.74% 27.18%sh 19.96% 24.14%libc-2.24.so 9.55% 14.64%[.] __strcmp_sse2 1.54% 0.00%[.] __tfind 1.07% 1.13%[.] _int_malloc 0.95% 0.00%[.] __strchr_sse2 0.89% 1.39%[.] __tsearch 0.76% 0.00%[.] strlen - Add branch stack / basic block info to 'perf annotate --stdio', where for each branch, we add an asm comment after the instruction with information on how often it was taken and predicted. See example with color output at: http://vger.kernel.org/~acme/perf/annotate_basic_blocks.png (Peter Zijlstra) - Add support for using symbols in address filters with Intel PT and ARM CoreSight (hardware assisted tracing facilities) (Adrian Hunter, Mathieu Poirier) - Add support for interacting with Coresight PMU ETMs/PTMs, that are IP blocks to perform hardware assisted tracing on a ARM CPU core (Mathieu Poirier) - Support generating cross arch probes, i.e. if you specify a vmlinux file for different arch than the one in the host machine, $ perf probe --definition function_name args will generate the probe definition string needed to append to the target machine /sys/kernel/debug/tracing/kprobes_events file, using scripting (Masami Hiramatsu). - Allow configuring the default 'perf report -s' sort order in ~/.perfconfig, for instance, "sym,dso" may be more fitting for kernel developers. (Arnaldo Carvalho de Melo) - ... plus lots of other changes, refactorings, features and fixes" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (149 commits) perf tests: Add dwarf unwind test for powerpc perf probe: Match linkage name with mangled name perf probe: Fix to cut off incompatible chars from group name perf probe: Skip if the function address is 0 perf probe: Ignore the error of finding inline instance perf intel-pt: Fix decoding when there are address filters perf intel-pt: Enable decoder to handle TIP.PGD with missing IP perf intel-pt: Read address filter from AUXTRACE_INFO event perf intel-pt: Record address filter in AUXTRACE_INFO event perf intel-pt: Add a helper function for processing AUXTRACE_INFO perf intel-pt: Fix missing error codes processing auxtrace_info perf intel-pt: Add support for recording the max non-turbo ratio perf intel-pt: Fix snapshot overlap detection decoder errors perf probe: Increase debug level of SDT debug messages perf record: Add support for using symbols in address filters perf symbols: Add dso__last_symbol() perf record: Fix error paths perf record: Rename label 'out_symbol_exit' perf script: Fix vanished idle symbols perf evsel: Add support for address filters ...
2016-10-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Three sets of overlapping changes. Nothing serious. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-30Merge branch 'x86/urgent' into x86/asmThomas Gleixner
Get the cr4 fixes so we can apply the final cleanup
2016-09-29tracing/syscalls: fix multiline in error message textColin Ian King
pr_info message spans two lines and the literal string is missing a white space between words. Add the white space. Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2016-09-26Merge tag 'trace-v4.8-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracefs fixes from Steven Rostedt: "Al Viro has been looking at the tracefs code, and has pointed out some issues. This contains one fix by me and one by Al. I'm sure that he'll come up with more but for now I tested these patches and they don't appear to have any negative impact on tracing" * tag 'trace-v4.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: fix memory leaks in tracing_buffers_splice_read() tracing: Move mutex to protect against resetting of seq data
2016-09-25fix memory leaks in tracing_buffers_splice_read()Al Viro
Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-25tracing: Move mutex to protect against resetting of seq dataSteven Rostedt (Red Hat)
The iter->seq can be reset outside the protection of the mutex. So can reading of user data. Move the mutex up to the beginning of the function. Fixes: d7350c3f45694 ("tracing/core: make the read callbacks reentrants") Cc: stable@vger.kernel.org # 2.6.30+ Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-09-23tracing: Call traceoff trigger after event is recordedMasami Hiramatsu
Call traceoff trigger after the event is recorded. Since current traceoff trigger is called before recording the event, we can not know what event stopped tracing. Typical usecase of traceoff/traceon trigger is tracing function calls and trace events between a pair of events. For example, trace function calls between syscall entry/exit. In that case, it is useful if we can see the return code of the target syscall. Link: http://lkml.kernel.org/r/147335074530.12462.4526186083406015005.stgit@devbox Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-09-15Merge branch 'linus' into x86/asm, to pick up recent fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-12tracing: Have max_latency be defined for HWLAT_TRACER as wellSteven Rostedt (Red Hat)
The hwlat tracer uses tr->max_latency, and if it's the only tracer enabled that uses it, the build will fail. Add max_latency and its file when the hwlat tracer is enabled. Link: http://lkml.kernel.org/r/d6c3b7eb-ba95-1ffa-0453-464e1e24262a@infradead.org Reported-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-09-10bpf: add BPF_CALL_x macros for declaring helpersDaniel Borkmann
This work adds BPF_CALL_<n>() macros and converts all the eBPF helper functions to use them, in a similar fashion like we do with SYSCALL_DEFINE<n>() macros that are used today. Motivation for this is to hide all the register handling and all necessary casts from the user, so that it is done automatically in the background when adding a BPF_CALL_<n>() call. This makes current helpers easier to review, eases to write future helpers, avoids getting the casting mess wrong, and allows for extending all helpers at once (f.e. build time checks, etc). It also helps detecting more easily in code reviews that unused registers are not instrumented in the code by accident, breaking compatibility with existing programs. BPF_CALL_<n>() internals are quite similar to SYSCALL_DEFINE<n>() ones with some fundamental differences, for example, for generating the actual helper function that carries all u64 regs, we need to fill unused regs, so that we always end up with 5 u64 regs as an argument. I reviewed several 0-5 generated BPF_CALL_<n>() variants of the .i results and they look all as expected. No sparse issue spotted. We let this also sit for a few days with Fengguang's kbuild test robot, and there were no issues seen. On s390, it barked on the "uses dynamic stack allocation" notice, which is an old one from bpf_perf_event_output{,_tp}() reappearing here due to the conversion to the call wrapper, just telling that the perf raw record/frag sits on stack (gcc with s390's -mwarn-dynamicstack), but that's all. Did various runtime tests and they were fine as well. All eBPF helpers are now converted to use these macros, getting rid of a good chunk of all the raw castings. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10bpf: add BPF_SIZEOF and BPF_FIELD_SIZEOF macrosDaniel Borkmann
Add BPF_SIZEOF() and BPF_FIELD_SIZEOF() macros to improve the code a bit which otherwise often result in overly long bytes_to_bpf_size(sizeof()) and bytes_to_bpf_size(FIELD_SIZEOF()) lines. So place them into a macro helper instead. Moreover, we currently have a BUILD_BUG_ON(BPF_FIELD_SIZEOF()) check in convert_bpf_extensions(), but we should rather make that generic as well and add a BUILD_BUG_ON() test in all BPF_SIZEOF()/BPF_FIELD_SIZEOF() users to detect any rewriter size issues at compile time. Note, there are currently none, but we want to assert that it stays this way. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-05Merge branch 'perf/urgent' into perf/core, to pick up fixed and resolve ↵Ingo Molnar
conflicts Conflicts: kernel/events/core.c Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-02bpf: introduce BPF_PROG_TYPE_PERF_EVENT program typeAlexei Starovoitov
Introduce BPF_PROG_TYPE_PERF_EVENT programs that can be attached to HW and SW perf events (PERF_TYPE_HARDWARE and PERF_TYPE_SOFTWARE correspondingly in uapi/linux/perf_event.h) The program visible context meta structure is struct bpf_perf_event_data { struct pt_regs regs; __u64 sample_period; }; which is accessible directly from the program: int bpf_prog(struct bpf_perf_event_data *ctx) { ... ctx->sample_period ... ... ctx->regs.ip ... } The bpf verifier rewrites the accesses into kernel internal struct bpf_perf_event_data_kern which allows changing struct perf_sample_data without affecting bpf programs. New fields can be added to the end of struct bpf_perf_event_data in the future. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-02tracing: Add NMI tracing in hwlat detectorSteven Rostedt (Red Hat)
As NMIs can also cause latency when interrupts are disabled, the hwlat detectory has no way to know if the latency it detects is from an NMI or an SMI or some other hardware glitch. As ftrace_nmi_enter/exit() funtions are no longer used (except for sh, which isn't supported anymore), I converted those to "arch_ftrace_nmi_enter/exit" and use ftrace_nmi_enter/exit() to check if hwlat detector is tracing or not, and if so, it calls into the hwlat utility. Since the hwlat detector only has a single kthread that is spinning with interrupts disabled, it marks what CPU it is on, and if the NMI callback happens on that CPU, it records the time spent in that NMI. This is added to the output that is generated by the hwlat detector as: #3 inner/outer(us): 9/9 ts:1470836488.206734548 #4 inner/outer(us): 0/8 ts:1470836497.140808588 #5 inner/outer(us): 0/6 ts:1470836499.140825168 nmi-total:5 nmi-count:1 #6 inner/outer(us): 9/9 ts:1470836501.140841748 All time is still tracked in microseconds. The NMI information is only shown when an NMI occurred during the sample. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-09-02tracing: Have hwlat trace migrate across tracing_cpumask CPUsSteven Rostedt (Red Hat)
Instead of having the hwlat detector thread stay on one CPU, have it migrate across all the CPUs specified by tracing_cpumask. If the user modifies the thread's CPU affinity, the migration will stop until the next instance that the tracer is instantiated. The migration happens at the end of each window (period). Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-09-02tracing: Added hardware latency tracerSteven Rostedt (Red Hat)
The hardware latency tracer has been in the PREEMPT_RT patch for some time. It is used to detect possible SMIs or any other hardware interruptions that the kernel is unaware of. Note, NMIs may also be detected, but that may be good to note as well. The logic is pretty simple. It simply creates a thread that spins on a single CPU for a specified amount of time (width) within a periodic window (window). These numbers may be adjusted by their cooresponding names in /sys/kernel/tracing/hwlat_detector/ The defaults are window = 1000000 us (1 second) width = 500000 us (1/2 second) The loop consists of: t1 = trace_clock_local(); t2 = trace_clock_local(); Where trace_clock_local() is a variant of sched_clock(). The difference of t2 - t1 is recorded as the "inner" timestamp and also the timestamp t1 - prev_t2 is recorded as the "outer" timestamp. If either of these differences are greater than the time denoted in /sys/kernel/tracing/tracing_thresh then it records the event. When this tracer is started, and tracing_thresh is zero, it changes to the default threshold of 10 us. The hwlat tracer in the PREEMPT_RT patch was originally written by Jon Masters. I have modified it quite a bit and turned it into a tracer. Based-on-code-by: Jon Masters <jcm@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-09-01ftrace: Access ret_stack->subtime only in the function profilerNamhyung Kim
The subtime is used only for function profiler with function graph tracer enabled. Move the definition of subtime under CONFIG_FUNCTION_PROFILER to reduce the memory usage. Also move the initialization of subtime into the graph entry callback. Link: http://lkml.kernel.org/r/20160831025529.24018-1-namhyung@kernel.org Cc: Ingo Molnar <mingo@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-09-01function_graph: Handle TRACE_BPUTS in print_graph_commentNamhyung Kim
It missed to handle TRACE_BPUTS so messages recorded by trace_bputs() will be shown with symbol info unnecessarily. You can see it with the trace_printk sample code: # cd /sys/kernel/tracing/ # echo sys_sync > set_graph_function # echo 1 > options/sym-offset # echo function_graph > current_tracer Note that the sys_sync filter was there to prevent recording other functions and the sym-offset option was needed since the first message was called from a module init function so kallsyms doesn't have the symbol and omitted in the output. # cd ~/build/kernel # insmod samples/trace_printk/trace-printk.ko # cd - # head trace Before: # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 1) | /* 0xffffffffa0002000: This is a static string that will use trace_bputs */ 1) | /* This is a dynamic string that will use trace_puts */ 1) | /* trace_printk_irq_work+0x5/0x7b [trace_printk]: (irq) This is a static string that will use trace_bputs */ 1) | /* (irq) This is a dynamic string that will use trace_puts */ 1) | /* (irq) This is a static string that will use trace_bprintk() */ 1) | /* (irq) This is a dynamic string that will use trace_printk */ After: # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 1) | /* This is a static string that will use trace_bputs */ 1) | /* This is a dynamic string that will use trace_puts */ 1) | /* (irq) This is a static string that will use trace_bputs */ 1) | /* (irq) This is a dynamic string that will use trace_puts */ 1) | /* (irq) This is a static string that will use trace_bprintk() */ 1) | /* (irq) This is a dynamic string that will use trace_printk */ Link: http://lkml.kernel.org/r/20160901024354.13720-1-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-09-01tracing/uprobe: Drop isdigit() check in create_trace_uprobeDmitry Safonov
It's useless. Before: [tracing]# echo 'p:test /a:0x0' >> uprobe_events [tracing]# echo 'p:test a:0x0' >> uprobe_events -bash: echo: write error: No such file or directory [tracing]# echo 'p:test 1:0x0' >> uprobe_events -bash: echo: write error: Invalid argument After: [tracing]# echo 'p:test 1:0x0' >> uprobe_events -bash: echo: write error: No such file or directory Link: http://lkml.kernel.org/r/20160825152110.25663-3-dsafonov@virtuozzo.com Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-08-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
All three conflicts were cases of simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-24ftrace: Add ftrace_graph_ret_addr() stack unwinding helpersJosh Poimboeuf
When function graph tracing is enabled for a function, ftrace modifies the stack by replacing the original return address with the address of a hook function (return_to_handler). Stack unwinders need a way to get the original return address. Add an arch-independent helper function for that named ftrace_graph_ret_addr(). This adds two variations of the function: one depends on HAVE_FUNCTION_GRAPH_RET_ADDR_PTR, and the other relies on an index state variable. The former is recommended because, in some cases, the latter can cause problems when the unwinder skips stack frames. It can get out of sync with the ret_stack index and wrong addresses can be reported for the stack trace. Once all arches have been ported to use HAVE_FUNCTION_GRAPH_RET_ADDR_PTR, we can get rid of the distinction. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/36bd90f762fc5e5af3929e3797a68a64906421cf.1471607358.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-24ftrace: Add return address pointer to ftrace_ret_stackJosh Poimboeuf
Storing this value will help prevent unwinders from getting out of sync with the function graph tracer ret_stack. Now instead of needing a stateful iterator, they can compare the return address pointer to find the right ret_stack entry. Note that an array of 50 ftrace_ret_stack structs is allocated for every task. So when an arch implements this, it will add either 200 or 400 bytes of memory usage per task (depending on whether it's a 32-bit or 64-bit platform). Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/a95cfcc39e8f26b89a430c56926af0bb217bc0a1.1471607358.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-24ftrace: Only allocate the ret_stack 'fp' field when neededJosh Poimboeuf
This saves some memory when HAVE_FUNCTION_GRAPH_FP_TEST isn't defined. On x86_64 with newer versions of gcc which have -mfentry, it saves 400 bytes per task. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/5c7747d9ea7b5cb47ef0a8ce8a6cea6bf7aa94bf.1471607358.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-24ftrace: Remove CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST from configJosh Poimboeuf
Make HAVE_FUNCTION_GRAPH_FP_TEST a normal define, independent from kconfig. This removes some config file pollution and simplifies the checking for the fp test. Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/2c4e5f05054d6d367f702fd153af7a0109dd5c81.1471607358.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-23ftrace: kprobe: uprobe: Show u8/u16/u32/u64 types in decimalMasami Hiramatsu
Change kprobe/uprobe-tracer to show the arguments type-casted with u8/u16/u32/u64 in decimal digits instead of hexadecimal. To minimize compatibility issue, the arguments without type casting are typed by x64 (or x32 for 32bit arch) by default. Note: all arguments set by old perf probe without types are shown in decimal by default. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Hemant Kumar <hemant@linux.vnet.ibm.com> Cc: Naohiro Aota <naohiro.aota@hgst.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/147151076135.12957.14684546093034343894.stgit@devbox Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-23ftrace: probe: Add README entries for k/uprobe-eventsMasami Hiramatsu
Add README entries for kprobe-events and uprobe-events. This allows user to check what options can be acceptable for running kernel. E.g. perf tools can choose correct types for the kernel. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Hemant Kumar <hemant@linux.vnet.ibm.com> Cc: Naohiro Aota <naohiro.aota@hgst.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/147151069524.12957.12957179170304055028.stgit@devbox Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-23ftrace: kprobe: uprobe: Add x8/x16/x32/x64 for hexadecimal typesMasami Hiramatsu
Add x8/x16/x32/x64 for hexadecimal type casting to kprobe/uprobe event tracer. These type casts can be used for integer arguments for explicitly showing them in hexadecimal digits in formatted text. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Hemant Kumar <hemant@linux.vnet.ibm.com> Cc: Naohiro Aota <naohiro.aota@hgst.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/147151067029.12957.11591314629326414783.stgit@devbox Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Minor overlapping changes for both merge conflicts. Resolution work done by Stephen Rothwell was used as a reference. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-16block: Fix secure eraseAdrian Hunter
Commit 288dab8a35a0 ("block: add a separate operation type for secure erase") split REQ_OP_SECURE_ERASE from REQ_OP_DISCARD without considering all the places REQ_OP_DISCARD was being used to mean either. Fix those. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Fixes: 288dab8a35a0 ("block: add a separate operation type for secure erase") Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-13bpf: allow bpf_get_prandom_u32() to be used in tracingAlexei Starovoitov
bpf_get_prandom_u32() was initially introduced for socket filters and later requested numberous times to be added to tracing bpf programs for the same reason as in socket filters: to be able to randomly select incoming events. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-13bpf: Add bpf_current_task_under_cgroup helperSargun Dhillon
This adds a bpf helper that's similar to the skb_in_cgroup helper to check whether the probe is currently executing in the context of a specific subset of the cgroupsv2 hierarchy. It does this based on membership test for a cgroup arraymap. It is invalid to call this in an interrupt, and it'll return an error. The helper is primarily to be used in debugging activities for containers, where you may have multiple programs running in a given top-level "container". Signed-off-by: Sargun Dhillon <sargun@sargun.me> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Tejun Heo <tj@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-07block: rename bio bi_rw to bi_opfJens Axboe
Since commit 63a4cc24867d, bio->bi_rw contains flags in the lower portion and the op code in the higher portions. This means that old code that relies on manually setting bi_rw is most likely going to be broken. Instead of letting that brokeness linger, rename the member, to force old and out-of-tree code to break at compile time instead of at runtime. No intended functional changes in this commit. Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-03Merge tag 'trace-v4.8-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "A few updates and fixes: - move the suppressing of the __builtin_return_address >0 warning to the tracing directory only. - metag recordmcount fix for newer glibc's - two tracing histogram fixes that were reported by KASAN" * tag 'trace-v4.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Fix use-after-free in hist_register_trigger() tracing: Fix use-after-free in hist_unreg_all/hist_enable_unreg_all Makefile: Mute warning for __builtin_return_address(>0) for tracing only ftrace/recordmcount: Work around for addition of metag magic but not relocations
2016-08-02tracing: Fix use-after-free in hist_register_trigger()Tom Zanussi
This fixes a use-after-free case flagged by KASAN; make sure the test happens before the potential free in this case. Link: http://lkml.kernel.org/r/48fd74ab61bebd7dca9714386bb47d7c5ccd6a7b.1467247517.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-08-02tracing: Fix use-after-free in hist_unreg_all/hist_enable_unreg_allSteven Rostedt
While running tools/testing/selftests test suite with KASAN, Dmitry Vyukov hit the following use-after-free report: ================================================================== BUG: KASAN: use-after-free in hist_unreg_all+0x1a1/0x1d0 at addr ffff880031632cc0 Read of size 8 by task ftracetest/7413 ================================================================== BUG kmalloc-128 (Not tainted): kasan: bad access detected ------------------------------------------------------------------ This fixes the problem, along with the same problem in hist_enable_unreg_all(). Link: http://lkml.kernel.org/r/c3d05b79e42555b6e36a3a99aae0e37315ee5304.1467247517.git.tom.zanussi@linux.intel.com Cc: Dmitry Vyukov <dvyukov@google.com> [Copied Steve's hist_enable_unreg_all() fix to hist_unreg_all()] Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-08-02Makefile: Mute warning for __builtin_return_address(>0) for tracing onlySteven Rostedt
With the latest gcc compilers, they give a warning if __builtin_return_address() parameter is greater than 0. That is because if it is used by a function called by a top level function (or in the case of the kernel, by assembly), it can try to access stack frames outside the stack and crash the system. The tracing system uses __builtin_return_address() of up to 2! But it is well aware of the dangers that it may have, and has even added precautions to protect against it (see the thunk code in arch/x86/entry/thunk*.S) Linus originally added KBUILD_CFLAGS that would suppress the warning for the entire kernel, as simply adding KBUILD_CFLAGS to the tracing directory wouldn't work. The tracing directory plays a bit with the CFLAGS and requires a little more logic. This adds that special logic to only suppress the warning for the tracing directory. If it is used anywhere else outside of tracing, the warning will still be triggered. Link: http://lkml.kernel.org/r/20160728223043.51996267@grimm.local.home Tested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>