diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2016-07-25 20:20:41 (GMT) |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2016-07-25 20:20:41 (GMT) |
commit | 7e4dc77b2869a683fc43c0394fca5441816390ba (patch) | |
tree | 62e734c599bc1da2712fdb63be996622c415a83a /tools/perf/builtin-stat.c | |
parent | 89e7eb098adfe342bc036f00201eb579d448f033 (diff) | |
parent | 5048c2af078d5976895d521262a8802ea791f3b0 (diff) | |
download | linux-7e4dc77b2869a683fc43c0394fca5441816390ba.tar.xz |
Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
"With over 300 commits it's been a busy cycle - with most of the work
concentrated on the tooling side (as it should).
The main kernel side enhancements were:
- Add per event callchain limit: Recently we introduced a sysctl to
tune the max-stack for all events for which callchains were
requested:
$ sysctl kernel.perf_event_max_stack
kernel.perf_event_max_stack = 127
Now this patch introduces a way to configure this per event, i.e.
this becomes possible:
$ perf record -e sched:*/max-stack=2/ -e block:*/max-stack=10/ -a
allowing finer tuning of how much buffer space callchains use.
This uses an u16 from the reserved space at the end, leaving
another u16 for future use.
There has been interest in even finer tuning, namely to control the
max stack for kernel and userspace callchains separately. Further
discussion is needed, we may for instance use the remaining u16 for
that and when it is present, assume that the sample_max_stack
introduced in this patch applies for the kernel, and the u16 left
is used for limiting the userspace callchain (Arnaldo Carvalho de
Melo)
- Optimize AUX event (hardware assisted side-band event) delivery
(Kan Liang)
- Rework Intel family name macro usage (this is partially x86 arch
work) (Dave Hansen)
- Refine and fix Intel LBR support (David Carrillo-Cisneros)
- Add support for Intel 'TopDown' events (Andi Kleen)
- Intel uncore PMU driver fixes and enhancements (Kan Liang)
- ... other misc changes.
Here's an incomplete list of the tooling enhancements (but there's
much more, see the shortlog and the git log for details):
- Support cross unwinding, i.e. collecting '--call-graph dwarf'
perf.data files in one machine and then doing analysis in another
machine of a different hardware architecture. This enables, for
instance, to do:
$ perf record -a --call-graph dwarf
on a x86-32 or aarch64 system and then do 'perf report' on it on a
x86_64 workstation (He Kuang)
- Allow reading from a backward ring buffer (one setup via
sys_perf_event_open() with perf_event_attr.write_backward = 1)
(Wang Nan)
- Finish merging initial SDT (Statically Defined Traces) support, see
cset comments for details about how it all works (Masami Hiramatsu)
- Support attaching eBPF programs to tracepoints (Wang Nan)
- Add demangling of symbols in programs written in the Rust language
(David Tolnay)
- Add support for tracepoints in the python binding, including an
example, that sets up and parses sched:sched_switch events,
tools/perf/python/tracepoint.py (Jiri Olsa)
- Introduce --stdio-color to set up the color output mode selection
in 'annotate' and 'report', allowing emit color escape sequences
when redirecting the output of these tools (Arnaldo Carvalho de
Melo)
- Add 'callindent' option to 'perf script -F', to indent the Intel PT
call stack, making this output more ftrace-like (Adrian Hunter,
Andi Kleen)
- Allow dumping the object files generated by llvm when processing
eBPF scriptlet events (Wang Nan)
- Add stackcollapse.py script to help generating flame graphs (Paolo
Bonzini)
- Add --ldlat option to 'perf mem' to specify load latency for loads
event (e.g. cpu/mem-loads/ ) (Jiri Olsa)
- Tooling support for Intel TopDown counters, recently added to the
kernel (Andi Kleen)"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (303 commits)
perf tests: Add is_printable_array test
perf tools: Make is_printable_array global
perf script python: Fix string vs byte array resolving
perf probe: Warn unmatched function filter correctly
perf cpu_map: Add more helpers
perf stat: Balance opening and reading events
tools: Copy linux/{hash,poison}.h and check for drift
perf tools: Remove include/linux/list.h from perf's MANIFEST
tools: Copy the bitops files accessed from the kernel and check for drift
Remove: kernel unistd*h files from perf's MANIFEST, not used
perf tools: Remove tools/perf/util/include/linux/const.h
perf tools: Remove tools/perf/util/include/asm/byteorder.h
perf tools: Add missing linux/compiler.h include to perf-sys.h
perf jit: Remove some no-op error handling
perf jit: Add missing curly braces
objtool: Initialize variable to silence old compiler
objtool: Add -I$(srctree)/tools/arch/$(ARCH)/include/uapi
perf record: Add --tail-synthesize option
perf session: Don't warn about out of order event if write_backward is used
perf tools: Enable overwrite settings
...
Diffstat (limited to 'tools/perf/builtin-stat.c')
-rw-r--r-- | tools/perf/builtin-stat.c | 199 |
1 files changed, 171 insertions, 28 deletions
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index ee7ada7..0c16d20 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -59,10 +59,13 @@ #include "util/thread.h" #include "util/thread_map.h" #include "util/counts.h" +#include "util/group.h" #include "util/session.h" #include "util/tool.h" +#include "util/group.h" #include "asm/bug.h" +#include <api/fs/fs.h> #include <stdlib.h> #include <sys/prctl.h> #include <locale.h> @@ -98,6 +101,15 @@ static const char * transaction_limited_attrs = { "}" }; +static const char * topdown_attrs[] = { + "topdown-total-slots", + "topdown-slots-retired", + "topdown-recovery-bubbles", + "topdown-fetch-bubbles", + "topdown-slots-issued", + NULL, +}; + static struct perf_evlist *evsel_list; static struct target target = { @@ -112,6 +124,7 @@ static volatile pid_t child_pid = -1; static bool null_run = false; static int detailed_run = 0; static bool transaction_run; +static bool topdown_run = false; static bool big_num = true; static int big_num_opt = -1; static const char *csv_sep = NULL; @@ -124,6 +137,7 @@ static unsigned int initial_delay = 0; static unsigned int unit_width = 4; /* strlen("unit") */ static bool forever = false; static bool metric_only = false; +static bool force_metric_only = false; static struct timespec ref_time; static struct cpu_map *aggr_map; static aggr_get_id_t aggr_get_id; @@ -276,8 +290,12 @@ perf_evsel__write_stat_event(struct perf_evsel *counter, u32 cpu, u32 thread, static int read_counter(struct perf_evsel *counter) { int nthreads = thread_map__nr(evsel_list->threads); - int ncpus = perf_evsel__nr_cpus(counter); - int cpu, thread; + int ncpus, cpu, thread; + + if (target__has_cpu(&target)) + ncpus = perf_evsel__nr_cpus(counter); + else + ncpus = 1; if (!counter->supported) return -ENOENT; @@ -317,7 +335,7 @@ static void read_counters(bool close_counters) { struct perf_evsel *counter; - evlist__for_each(evsel_list, counter) { + evlist__for_each_entry(evsel_list, counter) { if (read_counter(counter)) pr_debug("failed to read counter %s\n", counter->name); @@ -403,7 +421,7 @@ static int perf_stat_synthesize_config(bool is_pipe) * Synthesize other events stuff not carried within * attr event - unit, scale, name */ - evlist__for_each(evsel_list, counter) { + evlist__for_each_entry(evsel_list, counter) { if (!counter->supported) continue; @@ -536,7 +554,7 @@ static int __run_perf_stat(int argc, const char **argv) if (group) perf_evlist__set_leader(evsel_list); - evlist__for_each(evsel_list, counter) { + evlist__for_each_entry(evsel_list, counter) { try_again: if (create_perf_stat_counter(counter) < 0) { /* @@ -582,7 +600,7 @@ try_again: if (perf_evlist__apply_filters(evsel_list, &counter)) { error("failed to set filter \"%s\" on event %s with %d (%s)\n", counter->filter, perf_evsel__name(counter), errno, - strerror_r(errno, msg, sizeof(msg))); + str_error_r(errno, msg, sizeof(msg))); return -1; } @@ -623,7 +641,7 @@ try_again: wait(&status); if (workload_exec_errno) { - const char *emsg = strerror_r(workload_exec_errno, msg, sizeof(msg)); + const char *emsg = str_error_r(workload_exec_errno, msg, sizeof(msg)); pr_err("Workload failed: %s\n", emsg); return -1; } @@ -1120,7 +1138,7 @@ static void aggr_update_shadow(void) for (s = 0; s < aggr_map->nr; s++) { id = aggr_map->map[s]; - evlist__for_each(evsel_list, counter) { + evlist__for_each_entry(evsel_list, counter) { val = 0; for (cpu = 0; cpu < perf_evsel__nr_cpus(counter); cpu++) { s2 = aggr_get_id(evsel_list->cpus, cpu); @@ -1159,7 +1177,7 @@ static void print_aggr(char *prefix) id = aggr_map->map[s]; first = true; - evlist__for_each(evsel_list, counter) { + evlist__for_each_entry(evsel_list, counter) { val = ena = run = 0; nr = 0; for (cpu = 0; cpu < perf_evsel__nr_cpus(counter); cpu++) { @@ -1278,7 +1296,7 @@ static void print_no_aggr_metric(char *prefix) if (prefix) fputs(prefix, stat_config.output); - evlist__for_each(evsel_list, counter) { + evlist__for_each_entry(evsel_list, counter) { if (first) { aggr_printout(counter, cpu, 0); first = false; @@ -1302,7 +1320,15 @@ static int aggr_header_lens[] = { [AGGR_GLOBAL] = 0, }; -static void print_metric_headers(char *prefix) +static const char *aggr_header_csv[] = { + [AGGR_CORE] = "core,cpus,", + [AGGR_SOCKET] = "socket,cpus", + [AGGR_NONE] = "cpu,", + [AGGR_THREAD] = "comm-pid,", + [AGGR_GLOBAL] = "" +}; + +static void print_metric_headers(const char *prefix, bool no_indent) { struct perf_stat_output_ctx out; struct perf_evsel *counter; @@ -1313,12 +1339,18 @@ static void print_metric_headers(char *prefix) if (prefix) fprintf(stat_config.output, "%s", prefix); - if (!csv_output) + if (!csv_output && !no_indent) fprintf(stat_config.output, "%*s", aggr_header_lens[stat_config.aggr_mode], ""); + if (csv_output) { + if (stat_config.interval) + fputs("time,", stat_config.output); + fputs(aggr_header_csv[stat_config.aggr_mode], + stat_config.output); + } /* Print metrics headers only */ - evlist__for_each(evsel_list, counter) { + evlist__for_each_entry(evsel_list, counter) { os.evsel = counter; out.ctx = &os; out.print_metric = print_metric_header; @@ -1338,28 +1370,40 @@ static void print_interval(char *prefix, struct timespec *ts) sprintf(prefix, "%6lu.%09lu%s", ts->tv_sec, ts->tv_nsec, csv_sep); - if (num_print_interval == 0 && !csv_output && !metric_only) { + if (num_print_interval == 0 && !csv_output) { switch (stat_config.aggr_mode) { case AGGR_SOCKET: - fprintf(output, "# time socket cpus counts %*s events\n", unit_width, "unit"); + fprintf(output, "# time socket cpus"); + if (!metric_only) + fprintf(output, " counts %*s events\n", unit_width, "unit"); break; case AGGR_CORE: - fprintf(output, "# time core cpus counts %*s events\n", unit_width, "unit"); + fprintf(output, "# time core cpus"); + if (!metric_only) + fprintf(output, " counts %*s events\n", unit_width, "unit"); break; case AGGR_NONE: - fprintf(output, "# time CPU counts %*s events\n", unit_width, "unit"); + fprintf(output, "# time CPU"); + if (!metric_only) + fprintf(output, " counts %*s events\n", unit_width, "unit"); break; case AGGR_THREAD: - fprintf(output, "# time comm-pid counts %*s events\n", unit_width, "unit"); + fprintf(output, "# time comm-pid"); + if (!metric_only) + fprintf(output, " counts %*s events\n", unit_width, "unit"); break; case AGGR_GLOBAL: default: - fprintf(output, "# time counts %*s events\n", unit_width, "unit"); + fprintf(output, "# time"); + if (!metric_only) + fprintf(output, " counts %*s events\n", unit_width, "unit"); case AGGR_UNSET: break; } } + if (num_print_interval == 0 && metric_only) + print_metric_headers(" ", true); if (++num_print_interval == 25) num_print_interval = 0; } @@ -1428,8 +1472,8 @@ static void print_counters(struct timespec *ts, int argc, const char **argv) if (metric_only) { static int num_print_iv; - if (num_print_iv == 0) - print_metric_headers(prefix); + if (num_print_iv == 0 && !interval) + print_metric_headers(prefix, false); if (num_print_iv++ == 25) num_print_iv = 0; if (stat_config.aggr_mode == AGGR_GLOBAL && prefix) @@ -1442,11 +1486,11 @@ static void print_counters(struct timespec *ts, int argc, const char **argv) print_aggr(prefix); break; case AGGR_THREAD: - evlist__for_each(evsel_list, counter) + evlist__for_each_entry(evsel_list, counter) print_aggr_thread(counter, prefix); break; case AGGR_GLOBAL: - evlist__for_each(evsel_list, counter) + evlist__for_each_entry(evsel_list, counter) print_counter_aggr(counter, prefix); if (metric_only) fputc('\n', stat_config.output); @@ -1455,7 +1499,7 @@ static void print_counters(struct timespec *ts, int argc, const char **argv) if (metric_only) print_no_aggr_metric(prefix); else { - evlist__for_each(evsel_list, counter) + evlist__for_each_entry(evsel_list, counter) print_counter(counter, prefix); } break; @@ -1520,6 +1564,14 @@ static int stat__set_big_num(const struct option *opt __maybe_unused, return 0; } +static int enable_metric_only(const struct option *opt __maybe_unused, + const char *s __maybe_unused, int unset) +{ + force_metric_only = true; + metric_only = !unset; + return 0; +} + static const struct option stat_options[] = { OPT_BOOLEAN('T', "transaction", &transaction_run, "hardware transaction statistics"), @@ -1578,8 +1630,10 @@ static const struct option stat_options[] = { "aggregate counts per thread", AGGR_THREAD), OPT_UINTEGER('D', "delay", &initial_delay, "ms to wait before starting measurement after program start"), - OPT_BOOLEAN(0, "metric-only", &metric_only, - "Only print computed metrics. No raw values"), + OPT_CALLBACK_NOOPT(0, "metric-only", &metric_only, NULL, + "Only print computed metrics. No raw values", enable_metric_only), + OPT_BOOLEAN(0, "topdown", &topdown_run, + "measure topdown level 1 statistics"), OPT_END() }; @@ -1772,12 +1826,62 @@ static int perf_stat_init_aggr_mode_file(struct perf_stat *st) return 0; } +static int topdown_filter_events(const char **attr, char **str, bool use_group) +{ + int off = 0; + int i; + int len = 0; + char *s; + + for (i = 0; attr[i]; i++) { + if (pmu_have_event("cpu", attr[i])) { + len += strlen(attr[i]) + 1; + attr[i - off] = attr[i]; + } else + off++; + } + attr[i - off] = NULL; + + *str = malloc(len + 1 + 2); + if (!*str) + return -1; + s = *str; + if (i - off == 0) { + *s = 0; + return 0; + } + if (use_group) + *s++ = '{'; + for (i = 0; attr[i]; i++) { + strcpy(s, attr[i]); + s += strlen(s); + *s++ = ','; + } + if (use_group) { + s[-1] = '}'; + *s = 0; + } else + s[-1] = 0; + return 0; +} + +__weak bool arch_topdown_check_group(bool *warn) +{ + *warn = false; + return false; +} + +__weak void arch_topdown_group_warn(void) +{ +} + /* * Add default attributes, if there were no attributes specified or * if -d/--detailed, -d -d or -d -d -d is used: */ static int add_default_attributes(void) { + int err; struct perf_event_attr default_attrs0[] = { { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK }, @@ -1896,7 +2000,6 @@ static int add_default_attributes(void) return 0; if (transaction_run) { - int err; if (pmu_have_event("cpu", "cycles-ct") && pmu_have_event("cpu", "el-start")) err = parse_events(evsel_list, transaction_attrs, NULL); @@ -1909,6 +2012,46 @@ static int add_default_attributes(void) return 0; } + if (topdown_run) { + char *str = NULL; + bool warn = false; + + if (stat_config.aggr_mode != AGGR_GLOBAL && + stat_config.aggr_mode != AGGR_CORE) { + pr_err("top down event configuration requires --per-core mode\n"); + return -1; + } + stat_config.aggr_mode = AGGR_CORE; + if (nr_cgroups || !target__has_cpu(&target)) { + pr_err("top down event configuration requires system-wide mode (-a)\n"); + return -1; + } + + if (!force_metric_only) + metric_only = true; + if (topdown_filter_events(topdown_attrs, &str, + arch_topdown_check_group(&warn)) < 0) { + pr_err("Out of memory\n"); + return -1; + } + if (topdown_attrs[0] && str) { + if (warn) + arch_topdown_group_warn(); + err = parse_events(evsel_list, str, NULL); + if (err) { + fprintf(stderr, + "Cannot set up top down events %s: %d\n", + str, err); + free(str); + return -1; + } + } else { + fprintf(stderr, "System does not support topdown\n"); + return -1; + } + free(str); + } + if (!evsel_list->nr_entries) { if (target__has_cpu(&target)) default_attrs0[0].config = PERF_COUNT_SW_CPU_CLOCK; @@ -2010,7 +2153,7 @@ static int process_stat_round_event(struct perf_tool *tool __maybe_unused, const char **argv = session->header.env.cmdline_argv; int argc = session->header.env.nr_cmdline; - evlist__for_each(evsel_list, counter) + evlist__for_each_entry(evsel_list, counter) perf_stat_process_counter(&stat_config, counter); if (stat_round->type == PERF_STAT_ROUND_TYPE__FINAL) |