diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2015-11-04 04:38:09 +0300 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2015-11-04 04:38:09 +0300 |
commit | b02ac6b18cd4e2c76bf0a102c20c429b973f5f76 (patch) | |
tree | 87b3648f448627d61cb9ba32511584d6318b7bb6 /arch/powerpc | |
parent | 105ff3cbf225036b75a6a46c96d1ddce8e7bdc66 (diff) | |
parent | bebd23a2ed31d47e7dd746d3b125068aa2c42d85 (diff) | |
download | linux-b02ac6b18cd4e2c76bf0a102c20c429b973f5f76.tar.xz |
Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
"Kernel side changes:
- Improve accuracy of perf/sched clock on x86. (Adrian Hunter)
- Intel DS and BTS updates. (Alexander Shishkin)
- Intel cstate PMU support. (Kan Liang)
- Add group read support to perf_event_read(). (Peter Zijlstra)
- Branch call hardware sampling support, implemented on x86 and
PowerPC. (Stephane Eranian)
- Event groups transactional interface enhancements. (Sukadev
Bhattiprolu)
- Enable proper x86/intel/uncore PMU support on multi-segment PCI
systems. (Taku Izumi)
- ... misc fixes and cleanups.
The perf tooling team was very busy again with 200+ commits, the full
diff doesn't fit into lkml size limits. Here's an (incomplete) list
of the tooling highlights:
New features:
- Change the default event used in all tools (record/top): use the
most precise "cycles" hw counter available, i.e. when the user
doesn't specify any event, it will try using cycles:ppp, cycles:pp,
etc and fall back transparently until it finds a working counter.
(Arnaldo Carvalho de Melo)
- Integration of perf with eBPF that, given an eBPF .c source file
(or .o file built for the 'bpf' target with clang), will get it
automatically built, validated and loaded into the kernel via the
sys_bpf syscall, which can then be used and seen using 'perf trace'
and other tools.
(Wang Nan)
Various user interface improvements:
- Automatic pager invocation on long help output. (Namhyung Kim)
- Search for more options when passing args to -h, e.g.: (Arnaldo
Carvalho de Melo)
$ perf report -h interface
Usage: perf report [<options>]
--gtk Use the GTK2 interface
--stdio Use the stdio interface
--tui Use the TUI interface
- Show ordered command line options when -h is used or when an
unknown option is specified. (Arnaldo Carvalho de Melo)
- If options are passed after -h, show just its descriptions, not all
options. (Arnaldo Carvalho de Melo)
- Implement column based horizontal scrolling in the hists browser
(top, report), making it possible to use the TUI for things like
'perf mem report' where there are many more columns than can fit in
a terminal. (Arnaldo Carvalho de Melo)
- Enhance the error reporting of tracepoint event parsing, e.g.:
$ oldperf record -e sched:sched_switc usleep 1
event syntax error: 'sched:sched_switc'
\___ unknown tracepoint
Run 'perf list' for a list of valid events
Now we get the much nicer:
$ perf record -e sched:sched_switc ls
event syntax error: 'sched:sched_switc'
\___ can't access trace events
Error: No permissions to read /sys/kernel/debug/tracing/events/sched/sched_switc
Hint: Try 'sudo mount -o remount,mode=755 /sys/kernel/debug'
And after we have those mount point permissions fixed:
$ perf record -e sched:sched_switc ls
event syntax error: 'sched:sched_switc'
\___ unknown tracepoint
Error: File /sys/kernel/debug/tracing/events/sched/sched_switc not found.
Hint: Perhaps this kernel misses some CONFIG_ setting to enable this feature?.
I.e. basically now the event parsing routing uses the strerror_open()
routines introduced by and used in 'perf trace' work. (Jiri Olsa)
- Fail properly when pattern matching fails to find a tracepoint,
i.e. '-e non:existent' was being correctly handled, with a proper
error message about that not being a valid event, but '-e
non:existent*' wasn't, fix it. (Jiri Olsa)
- Do event name substring search as last resort in 'perf list'.
(Arnaldo Carvalho de Melo)
E.g.:
# perf list clock
List of pre-defined events (to be used in -e):
cpu-clock [Software event]
task-clock [Software event]
uncore_cbox_0/clockticks/ [Kernel PMU event]
uncore_cbox_1/clockticks/ [Kernel PMU event]
kvm:kvm_pvclock_update [Tracepoint event]
kvm:kvm_update_master_clock [Tracepoint event]
power:clock_disable [Tracepoint event]
power:clock_enable [Tracepoint event]
power:clock_set_rate [Tracepoint event]
syscalls:sys_enter_clock_adjtime [Tracepoint event]
syscalls:sys_enter_clock_getres [Tracepoint event]
syscalls:sys_enter_clock_gettime [Tracepoint event]
syscalls:sys_enter_clock_nanosleep [Tracepoint event]
syscalls:sys_enter_clock_settime [Tracepoint event]
syscalls:sys_exit_clock_adjtime [Tracepoint event]
syscalls:sys_exit_clock_getres [Tracepoint event]
syscalls:sys_exit_clock_gettime [Tracepoint event]
syscalls:sys_exit_clock_nanosleep [Tracepoint event]
syscalls:sys_exit_clock_settime [Tracepoint event]
Intel PT hardware tracing enhancements:
- Accept a zero --itrace period, meaning "as often as possible". In
the case of Intel PT that is the same as a period of 1 and a unit
of 'instructions' (i.e. --itrace=i1i). (Adrian Hunter)
- Harmonize itrace's synthesized callchains with the existing
--max-stack tool option. (Adrian Hunter)
- Allow time to be displayed in nanoseconds in 'perf script'.
(Adrian Hunter)
- Fix potential infinite loop when handling Intel PT timestamps.
(Adrian Hunter)
- Slighly improve Intel PT debug logging. (Adrian Hunter)
- Warn when AUX data has been lost, just like when processing
PERF_RECORD_LOST. (Adrian Hunter)
- Further document export-to-postgresql.py script. (Adrian Hunter)
- Add option to synthesize branch stack from auxtrace data. (Adrian
Hunter)
Misc notable changes:
- Switch the default callchain output mode to 'graph,0.5,caller', to
make it look like the default for other tools, reducing the
learning curve for people used to 'caller' based viewing. (Arnaldo
Carvalho de Melo)
- various call chain usability enhancements. (Namhyung Kim)
- Introduce the 'P' event modifier, meaning 'max precision level,
please', i.e.:
$ perf record -e cycles:P usleep 1
Is now similar to:
$ perf record usleep 1
Useful, for instance, when specifying multiple events. (Jiri Olsa)
- Add 'socket' sort entry, to sort by the processor socket in 'perf
top' and 'perf report'. (Kan Liang)
- Introduce --socket-filter to 'perf report', for filtering by
processor socket. (Kan Liang)
- Add new "Zoom into Processor Socket" operation in the perf hists
browser, used in 'perf top' and 'perf report'. (Kan Liang)
- Allow probing on kmodules without DWARF. (Masami Hiramatsu)
- Fix 'perf probe -l' for probes added to kernel module functions.
(Masami Hiramatsu)
- Preparatory work for the 'perf stat record' feature that will allow
generating perf.data files with counting data in addition to the
sampling mode we have now (Jiri Olsa)
- Update libtraceevent KVM plugin. (Paolo Bonzini)
- ... plus lots of other enhancements that I failed to list properly,
by: Adrian Hunter, Alexander Shishkin, Andi Kleen, Andrzej Hajda,
Arnaldo Carvalho de Melo, Dima Kogan, Don Zickus, Geliang Tang, He
Kuang, Huaitong Han, Ingo Molnar, Jan Stancek, Jiri Olsa, Kan
Liang, Kirill Tkhai, Masami Hiramatsu, Matt Fleming, Namhyung Kim,
Paolo Bonzini, Peter Zijlstra, Rabin Vincent, Scott Wood, Stephane
Eranian, Sukadev Bhattiprolu, Taku Izumi, Vaishali Thakkar, Wang
Nan, Yang Shi and Yunlong Song"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (260 commits)
perf unwind: Pass symbol source to libunwind
tools build: Fix libiberty feature detection
perf tools: Compile scriptlets to BPF objects when passing '.c' to --event
perf record: Add clang options for compiling BPF scripts
perf bpf: Attach eBPF filter to perf event
perf tools: Make sure fixdep is built before libbpf
perf script: Enable printing of branch stack
perf trace: Add cmd string table to decode sys_bpf first arg
perf bpf: Collect perf_evsel in BPF object files
perf tools: Load eBPF object into kernel
perf tools: Create probe points for BPF programs
perf tools: Enable passing bpf object file to --event
perf ebpf: Add the libbpf glue
perf tools: Make perf depend on libbpf
perf symbols: Fix endless loop in dso__split_kallsyms_for_kcore
perf tools: Enable pre-event inherit setting by config terms
perf symbols: we can now read separate debug-info files based on a build ID
perf symbols: Fix type error when reading a build-id
perf tools: Search for more options when passing args to -h
perf stat: Cache aggregated map entries in extra cpumap
...
Diffstat (limited to 'arch/powerpc')
-rw-r--r-- | arch/powerpc/perf/core-book3s.c | 36 | ||||
-rw-r--r-- | arch/powerpc/perf/hv-24x7.c | 166 | ||||
-rw-r--r-- | arch/powerpc/perf/power8-pmu.c | 3 |
3 files changed, 197 insertions, 8 deletions
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c index b0382f3f1095..d1e65ce545b3 100644 --- a/arch/powerpc/perf/core-book3s.c +++ b/arch/powerpc/perf/core-book3s.c @@ -48,7 +48,7 @@ struct cpu_hw_events { unsigned long amasks[MAX_HWEVENTS][MAX_EVENT_ALTERNATIVES]; unsigned long avalues[MAX_HWEVENTS][MAX_EVENT_ALTERNATIVES]; - unsigned int group_flag; + unsigned int txn_flags; int n_txn_start; /* BHRB bits */ @@ -1441,7 +1441,7 @@ static int power_pmu_add(struct perf_event *event, int ef_flags) * skip the schedulability test here, it will be performed * at commit time(->commit_txn) as a whole */ - if (cpuhw->group_flag & PERF_EVENT_TXN) + if (cpuhw->txn_flags & PERF_PMU_TXN_ADD) goto nocheck; if (check_excludes(cpuhw->event, cpuhw->flags, n0, 1)) @@ -1586,13 +1586,22 @@ static void power_pmu_stop(struct perf_event *event, int ef_flags) * Start group events scheduling transaction * Set the flag to make pmu::enable() not perform the * schedulability test, it will be performed at commit time + * + * We only support PERF_PMU_TXN_ADD transactions. Save the + * transaction flags but otherwise ignore non-PERF_PMU_TXN_ADD + * transactions. */ -static void power_pmu_start_txn(struct pmu *pmu) +static void power_pmu_start_txn(struct pmu *pmu, unsigned int txn_flags) { struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events); + WARN_ON_ONCE(cpuhw->txn_flags); /* txn already in flight */ + + cpuhw->txn_flags = txn_flags; + if (txn_flags & ~PERF_PMU_TXN_ADD) + return; + perf_pmu_disable(pmu); - cpuhw->group_flag |= PERF_EVENT_TXN; cpuhw->n_txn_start = cpuhw->n_events; } @@ -1604,8 +1613,15 @@ static void power_pmu_start_txn(struct pmu *pmu) static void power_pmu_cancel_txn(struct pmu *pmu) { struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events); + unsigned int txn_flags; + + WARN_ON_ONCE(!cpuhw->txn_flags); /* no txn in flight */ + + txn_flags = cpuhw->txn_flags; + cpuhw->txn_flags = 0; + if (txn_flags & ~PERF_PMU_TXN_ADD) + return; - cpuhw->group_flag &= ~PERF_EVENT_TXN; perf_pmu_enable(pmu); } @@ -1621,7 +1637,15 @@ static int power_pmu_commit_txn(struct pmu *pmu) if (!ppmu) return -EAGAIN; + cpuhw = this_cpu_ptr(&cpu_hw_events); + WARN_ON_ONCE(!cpuhw->txn_flags); /* no txn in flight */ + + if (cpuhw->txn_flags & ~PERF_PMU_TXN_ADD) { + cpuhw->txn_flags = 0; + return 0; + } + n = cpuhw->n_events; if (check_excludes(cpuhw->event, cpuhw->flags, 0, n)) return -EAGAIN; @@ -1632,7 +1656,7 @@ static int power_pmu_commit_txn(struct pmu *pmu) for (i = cpuhw->n_txn_start; i < n; ++i) cpuhw->event[i]->hw.config = cpuhw->events[i]; - cpuhw->group_flag &= ~PERF_EVENT_TXN; + cpuhw->txn_flags = 0; perf_pmu_enable(pmu); return 0; } diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c index 527c8b98e97e..9f9dfda9ed2c 100644 --- a/arch/powerpc/perf/hv-24x7.c +++ b/arch/powerpc/perf/hv-24x7.c @@ -142,6 +142,15 @@ static struct attribute_group event_long_desc_group = { static struct kmem_cache *hv_page_cache; +DEFINE_PER_CPU(int, hv_24x7_txn_flags); +DEFINE_PER_CPU(int, hv_24x7_txn_err); + +struct hv_24x7_hw { + struct perf_event *events[255]; +}; + +DEFINE_PER_CPU(struct hv_24x7_hw, hv_24x7_hw); + /* * request_buffer and result_buffer are not required to be 4k aligned, * but are not allowed to cross any 4k boundary. Aligning them to 4k is @@ -1231,9 +1240,48 @@ static void update_event_count(struct perf_event *event, u64 now) static void h_24x7_event_read(struct perf_event *event) { u64 now; + struct hv_24x7_request_buffer *request_buffer; + struct hv_24x7_hw *h24x7hw; + int txn_flags; + + txn_flags = __this_cpu_read(hv_24x7_txn_flags); + + /* + * If in a READ transaction, add this counter to the list of + * counters to read during the next HCALL (i.e commit_txn()). + * If not in a READ transaction, go ahead and make the HCALL + * to read this counter by itself. + */ + + if (txn_flags & PERF_PMU_TXN_READ) { + int i; + int ret; - now = h_24x7_get_value(event); - update_event_count(event, now); + if (__this_cpu_read(hv_24x7_txn_err)) + return; + + request_buffer = (void *)get_cpu_var(hv_24x7_reqb); + + ret = add_event_to_24x7_request(event, request_buffer); + if (ret) { + __this_cpu_write(hv_24x7_txn_err, ret); + } else { + /* + * Assoicate the event with the HCALL request index, + * so ->commit_txn() can quickly find/update count. + */ + i = request_buffer->num_requests - 1; + + h24x7hw = &get_cpu_var(hv_24x7_hw); + h24x7hw->events[i] = event; + put_cpu_var(h24x7hw); + } + + put_cpu_var(hv_24x7_reqb); + } else { + now = h_24x7_get_value(event); + update_event_count(event, now); + } } static void h_24x7_event_start(struct perf_event *event, int flags) @@ -1255,6 +1303,117 @@ static int h_24x7_event_add(struct perf_event *event, int flags) return 0; } +/* + * 24x7 counters only support READ transactions. They are + * always counting and dont need/support ADD transactions. + * Cache the flags, but otherwise ignore transactions that + * are not PERF_PMU_TXN_READ. + */ +static void h_24x7_event_start_txn(struct pmu *pmu, unsigned int flags) +{ + struct hv_24x7_request_buffer *request_buffer; + struct hv_24x7_data_result_buffer *result_buffer; + + /* We should not be called if we are already in a txn */ + WARN_ON_ONCE(__this_cpu_read(hv_24x7_txn_flags)); + + __this_cpu_write(hv_24x7_txn_flags, flags); + if (flags & ~PERF_PMU_TXN_READ) + return; + + request_buffer = (void *)get_cpu_var(hv_24x7_reqb); + result_buffer = (void *)get_cpu_var(hv_24x7_resb); + + init_24x7_request(request_buffer, result_buffer); + + put_cpu_var(hv_24x7_resb); + put_cpu_var(hv_24x7_reqb); +} + +/* + * Clean up transaction state. + * + * NOTE: Ignore state of request and result buffers for now. + * We will initialize them during the next read/txn. + */ +static void reset_txn(void) +{ + __this_cpu_write(hv_24x7_txn_flags, 0); + __this_cpu_write(hv_24x7_txn_err, 0); +} + +/* + * 24x7 counters only support READ transactions. They are always counting + * and dont need/support ADD transactions. Clear ->txn_flags but otherwise + * ignore transactions that are not of type PERF_PMU_TXN_READ. + * + * For READ transactions, submit all pending 24x7 requests (i.e requests + * that were queued by h_24x7_event_read()), to the hypervisor and update + * the event counts. + */ +static int h_24x7_event_commit_txn(struct pmu *pmu) +{ + struct hv_24x7_request_buffer *request_buffer; + struct hv_24x7_data_result_buffer *result_buffer; + struct hv_24x7_result *resb; + struct perf_event *event; + u64 count; + int i, ret, txn_flags; + struct hv_24x7_hw *h24x7hw; + + txn_flags = __this_cpu_read(hv_24x7_txn_flags); + WARN_ON_ONCE(!txn_flags); + + ret = 0; + if (txn_flags & ~PERF_PMU_TXN_READ) + goto out; + + ret = __this_cpu_read(hv_24x7_txn_err); + if (ret) + goto out; + + request_buffer = (void *)get_cpu_var(hv_24x7_reqb); + result_buffer = (void *)get_cpu_var(hv_24x7_resb); + + ret = make_24x7_request(request_buffer, result_buffer); + if (ret) { + log_24x7_hcall(request_buffer, result_buffer, ret); + goto put_reqb; + } + + h24x7hw = &get_cpu_var(hv_24x7_hw); + + /* Update event counts from hcall */ + for (i = 0; i < request_buffer->num_requests; i++) { + resb = &result_buffer->results[i]; + count = be64_to_cpu(resb->elements[0].element_data[0]); + event = h24x7hw->events[i]; + h24x7hw->events[i] = NULL; + update_event_count(event, count); + } + + put_cpu_var(hv_24x7_hw); + +put_reqb: + put_cpu_var(hv_24x7_resb); + put_cpu_var(hv_24x7_reqb); +out: + reset_txn(); + return ret; +} + +/* + * 24x7 counters only support READ transactions. They are always counting + * and dont need/support ADD transactions. However, regardless of type + * of transaction, all we need to do is cleanup, so we don't have to check + * the type of transaction. + */ +static void h_24x7_event_cancel_txn(struct pmu *pmu) +{ + WARN_ON_ONCE(!__this_cpu_read(hv_24x7_txn_flags)); + reset_txn(); +} + static struct pmu h_24x7_pmu = { .task_ctx_nr = perf_invalid_context, @@ -1266,6 +1425,9 @@ static struct pmu h_24x7_pmu = { .start = h_24x7_event_start, .stop = h_24x7_event_stop, .read = h_24x7_event_read, + .start_txn = h_24x7_event_start_txn, + .commit_txn = h_24x7_event_commit_txn, + .cancel_txn = h_24x7_event_cancel_txn, }; static int hv_24x7_init(void) diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c index 396351db601b..7d5e295255b7 100644 --- a/arch/powerpc/perf/power8-pmu.c +++ b/arch/powerpc/perf/power8-pmu.c @@ -676,6 +676,9 @@ static u64 power8_bhrb_filter_map(u64 branch_sample_type) if (branch_sample_type & PERF_SAMPLE_BRANCH_IND_CALL) return -1; + if (branch_sample_type & PERF_SAMPLE_BRANCH_CALL) + return -1; + if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) { pmu_bhrb_filter |= POWER8_MMCRA_IFM1; return pmu_bhrb_filter; |