summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2019-09-01perf hist: Remove needless ui/progress.h from hist.hArnaldo Carvalho de Melo4-1/+4
We only need a forward declaration, add it and fixup all the files that need ui_progress definitions but were wrongly getting it from hist.h. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-84a90o9jdxybffxo9jmouokw@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-09-01perf dsos: Move the dsos struct and its methods to separate source filesArnaldo Carvalho de Melo49-257/+331
So that we can reduce the header dependency tree further, in the process noticed that lots of places were getting even things like build-id routines and 'struct perf_tool' definition indirectly, so fix all those too. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-ti0btma9ow5ndrytyoqdk62j@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-09-01perf symbols: Move symsrc prototypes to a separate headerArnaldo Carvalho de Melo10-33/+58
So that we can remove dso.h from symbol.h and reduce the header dependency tree. Fixup cases where struct dso guts are needed but were obtained via symbol.h, indirectly. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-ip683cegt306ncu3gsz7ii21@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-09-01perf symbols: Add missing linux/refcount.h to symbol.hArnaldo Carvalho de Melo1-0/+1
We use refcount_t there, so we need that header or else it'll break when we remove dso.h, that is from where it is getting that definition now... Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-5albrk0uve6x9cf6x3ebwpae@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-09-01perf symbol: Move C++ demangle defines to the only file using itArnaldo Carvalho de Melo2-6/+6
No need to have it generally available in such a critical header as symbol.h. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-es1ufxv7bihiumytn5dm3drn@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-09-01perf dso: Adopt DSO related macros from symbol.hArnaldo Carvalho de Melo6-3/+7
Reducing the size of symbol.h by removing things that are better placed somewhere else. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-edenkmjt1oe5fks2s6umd30b@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-09-01libtraceevent: Change users plugin directoryTzvetomir Stoyanov2-4/+4
To be compliant with XDG user directory layout, the user's plugin directory is changed from ~/.traceevent/plugins to ~/.local/lib/traceevent/plugins/ Suggested-by: Patrick McLean <chutzpah@gentoo.org> Signed-off-by: Tzvetomir Stoyanov <tstoyanov@vmware.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Patrick McLean <chutzpah@gentoo.org> Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/linux-trace-devel/20190313144206.41e75cf8@patrickm/ Link: http://lore.kernel.org/linux-trace-devel/20190801074959.22023-4-tz.stoyanov@gmail.com Link: http://lore.kernel.org/lkml/20190805204355.344622683@goodmis.org Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-09-01libtraceevent: Remove tep_register_trace_clock()Tzvetomir Stoyanov3-14/+0
The tep_register_trace_clock() API is used to instruct the traceevent library how to print the event time stamps. As event print interface if redesigned, this API is not needed any more. The new event print API is flexible and the user can specify how the time stamps are printed. Signed-off-by: Tzvetomir Stoyanov <tstoyanov@vmware.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Patrick McLean <chutzpah@gentoo.org> Cc: linux-trace-devel@vger.kernel.org Link: http://lore.kernel.org/linux-trace-devel/20190801074959.22023-3-tz.stoyanov@gmail.com Link: http://lore.kernel.org/lkml/20190805204355.195042846@goodmis.org Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-09-01libtraceevent, perf tools: Changes in tep_print_event_* APIsTzvetomir Stoyanov7-200/+203
Libtraceevent APIs for printing various trace events information are complicated, there are complex extra parameters. To control the way event information is printed, the user should call a set of functions in a specific sequence. These APIs are reimplemented to provide a more simple interface for printing event information. Removed APIs: tep_print_event_task() tep_print_event_time() tep_print_event_data() tep_event_info() tep_is_latency_format() tep_set_latency_format() tep_data_latency_format() tep_set_print_raw() A new API for printing event information is introduced: void tep_print_event(struct tep_handle *tep, struct trace_seq *s, struct tep_record *record, const char *fmt, ...); where "fmt" is a printf-like format string, followed by the event fields to be printed. Supported fields: TEP_PRINT_PID, "%d" - event PID TEP_PRINT_CPU, "%d" - event CPU TEP_PRINT_COMM, "%s" - event command string TEP_PRINT_NAME, "%s" - event name TEP_PRINT_LATENCY, "%s" - event latency TEP_PRINT_TIME, %d - event time stamp. A divisor and precision can be specified as part of this format string: "%precision.divisord". Example: "%3.1000d" - divide the time by 1000 and print the first 3 digits before the dot. Thus, the time stamp "123456000" will be printed as "123.456" TEP_PRINT_INFO, "%s" - event information. TEP_PRINT_INFO_RAW, "%s" - event information, in raw format. Example: tep_print_event(tep, s, record, "%16s-%-5d [%03d] %s %6.1000d %s %s", TEP_PRINT_COMM, TEP_PRINT_PID, TEP_PRINT_CPU, TEP_PRINT_LATENCY, TEP_PRINT_TIME, TEP_PRINT_NAME, TEP_PRINT_INFO); Output: ls-11314 [005] d.h. 185207.366383 function __wake_up Signed-off-by: Tzvetomir Stoyanov <tstoyanov@vmware.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: linux-trace-devel@vger.kernel.org Cc: Patrick McLean <chutzpah@gentoo.org> Link: http://lore.kernel.org/linux-trace-devel/20190801074959.22023-2-tz.stoyanov@gmail.com Link: http://lore.kernel.org/lkml/20190805204355.041132030@goodmis.org Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-09-01perf event: Remove needless include directives from event.hArnaldo Carvalho de Melo8-6/+20
bpf.h and build-id.h are not needed at all in event.h, remove them. And fixup the fallout of files that were getting needed stuff from this now pruned include. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-rdm3dgtlrndmmnlc4bafsg3b@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-09-01perf env: Remove env.h from other headers where just a fwd decl is neededArnaldo Carvalho de Melo5-3/+9
And fixup the fallout of c files not building due to now missing headers. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-sw8k3kpla98pr3rqypbjk9hf@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-09-01perf debug: Remove needless include directives from debug.hArnaldo Carvalho de Melo75-6/+104
All we need there is a forward declaration for 'union perf_event', so remove it from there and add missing header directives in places using things from this indirect include. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-7ftk0ztstqub1tirjj8o8xbl@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-31tools/power turbostat: update version numberLen Brown1-1/+1
Today is 19.08.31, at least in some parts of the world. Signed-off-by: Len Brown <len.brown@intel.com>
2019-08-31tools/power turbostat: Add support for Hygon Fam 18h (Dhyana) RAPLPu Wen1-2/+7
Commit 9392bd98bba760be96ee ("tools/power turbostat: Add support for AMD Fam 17h (Zen) RAPL") and the commit 3316f99a9f1b68c578c5 ("tools/power turbostat: Also read package power on AMD F17h (Zen)") add AMD Fam 17h RAPL support. Hygon Family 18h(Dhyana) support RAPL in bit 14 of CPUID 0x80000007 EDX, and has MSRs RAPL_PWR_UNIT/CORE_ENERGY_STAT/PKG_ENERGY_STAT. So add Hygon Dhyana Family 18h support for RAPL. Already tested on Hygon multi-node systems and it shows correct per-core energy usage and the total package power. Signed-off-by: Pu Wen <puwen@hygon.cn> Reviewed-by: Calvin Walton <calvin.walton@kepstin.ca> Signed-off-by: Len Brown <len.brown@intel.com>
2019-08-31tools/power turbostat: Fix caller parameter of get_tdp_amd()Pu Wen1-1/+1
Commit 9392bd98bba760be96ee ("tools/power turbostat: Add support for AMD Fam 17h (Zen) RAPL") add a function get_tdp_amd(), the parameter is CPU family. But the rapl_probe_amd() function use wrong model parameter. Fix the wrong caller parameter of get_tdp_amd() to use family. Cc: <stable@vger.kernel.org> # v5.1+ Signed-off-by: Pu Wen <puwen@hygon.cn> Reviewed-by: Calvin Walton <calvin.walton@kepstin.ca> Signed-off-by: Len Brown <len.brown@intel.com>
2019-08-31tools/power turbostat: Fix CPU%C1 display valueSrinivas Pandruvada1-6/+17
In some case C1% will be wrong value, when platform doesn't have MSR for C1 residency. For example: Core CPU CPU%c1 - - 100.00 0 0 100.00 0 2 100.00 1 1 100.00 1 3 100.00 But adding Busy% will fix this Core CPU Busy% CPU%c1 - - 99.77 0.23 0 0 99.77 0.23 0 2 99.77 0.23 1 1 99.77 0.23 1 3 99.77 0.23 This issue can be reproduced on most of the recent systems including Broadwell, Skylake and later. This is because if we don't select Busy% or Avg_MHz or Bzy_MHz then mperf value will not be read from MSR, so it will be 0. But this is required for C1% calculation when MSR for C1 residency is not present. Same is true for C3, C6 and C7 column selection. So add another define DO_BIC_READ(), which doesn't depend on user column selection and use for mperf, C3, C6 and C7 related counters. So when there is no platform support for C1 residency counters, we still read these counters, if the CPU has support and user selected display of CPU%c1. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2019-08-31tools/power turbostat: do not enforce 1msArtem Bityutskiy1-5/+0
Turbostat works by taking a snapshot of counters, sleeping, taking another snapshot, calculating deltas, and printing out the table. The sleep time is controlled via -i option or by user sending a signal or a character to stdin. In the latter case, turbostat always adds 1 ms sleep before it reads the counters, in order to avoid larger imprecisions in the results in prints. While the 1 ms delay may be a good idea for a "dumb" user, it is a problem for an "aware" user. I do thousands and thousands of measurements over a short period of time (like 2ms), and turbostat unconditionally adds a 1ms to my interval, so I cannot get what I really need. This patch removes the unconditional 1ms sleep. This is an expert user tool, after all, and non-experts will unlikely ever use it in the non-fixed interval mode anyway, so I think it is OK to remove the 1ms delay. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2019-08-31tools/power turbostat: read from pipes tooArtem Bityutskiy1-4/+16
Commit '47936f944e78 tools/power turbostat: fix printing on input' make a valid fix, but it completely disabled piped stdin support, which is a valuable use-case. Indeed, if stdin is a pipe, turbostat won't read anything from it, so it becomes impossible to get turbostat output at user-defined moments, instead of the regular intervals. There is no reason why this should works for terminals, but not for pipes. This patch improves the situation. Instead of ignoring pipes, we read data from them but gracefully handle the EOF case. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2019-08-31tools/power turbostat: Add Ice Lake NNPI supportRajneesh Bhardwaj1-0/+1
This enables turbostat utility on Ice Lake NNPI SoC. Link: https://lkml.org/lkml/2019/6/5/1034 Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2019-08-31tools/power turbostat: rename has_hsw_msrs()Len Brown1-4/+4
Perhaps if this more descriptive name had been used, then we wouldn't have had the HSW ULT vs HSW CORE bug, fixed by the previous commit. Signed-off-by: Len Brown <len.brown@intel.com>
2019-08-31tools/power turbostat: Fix Haswell Core systemsLen Brown1-4/+6
turbostat: cpu0: msr offset 0x630 read failed: Input/output error because Haswell Core does not have C8-C10. Output C8-C10 only on Haswell ULT. Fixes: f5a4c76ad7de ("tools/power turbostat: consolidate duplicate model numbers") Reported-by: Prarit Bhargava <prarit@redhat.com> Suggested-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com> Signed-off-by: Len Brown <len.brown@intel.com>
2019-08-31tools/power turbostat: add Jacobsville supportZhang Rui1-0/+3
Jacobsville behaves like Denverton. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2019-08-31tools/power turbostat: fix buffer overrunNaoya Horiguchi1-1/+1
turbostat could be terminated by general protection fault on some latest hardwares which (for example) support 9 levels of C-states and show 18 "tADDED" lines. That bloats the total output and finally causes buffer overrun. So let's extend the buffer to avoid this. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Len Brown <len.brown@intel.com>
2019-08-31tools/power turbostat: fix file descriptor leaksGustavo A. R. Silva1-0/+1
Fix file descriptor leaks by closing fp before return. Addresses-Coverity-ID: 1444591 ("Resource leak") Addresses-Coverity-ID: 1444592 ("Resource leak") Fixes: 5ea7647b333f ("tools/power turbostat: Warn on bad ACPI LPIT data") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Len Brown <len.brown@intel.com>
2019-08-31tools/power turbostat: fix leak of file descriptor on error return pathColin Ian King1-0/+1
Currently the error return path does not close the file fp and leaks a file descriptor. Fix this by closing the file. Fixes: 5ea7647b333f ("tools/power turbostat: Warn on bad ACPI LPIT data") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Len Brown <len.brown@intel.com>
2019-08-31tools/power turbostat: Make interval calculation per thread to reduce jitterYazen Ghannam1-3/+10
Turbostat currently normalizes TSC and other values by dividing by an interval. This interval is the delta between the start of one global (all counters on all CPUs) sampling and the start of another. However, this introduces a lot of jitter into the data. In order to reduce jitter, the interval calculation should be based on timestamps taken per thread and close to the start of the thread's sampling. Define a per thread time value to hold the delta between samples taken on the thread. Use the timestamp taken at the beginning of sampling to calculate the delta. Move the thread's beginning timestamp to after the CPU migration to avoid jitter due to the migration. Use the global time delta for the average time delta. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Len Brown <len.brown@intel.com>
2019-08-31tools/power turbostat: remove duplicate pc10 columnLen Brown1-1/+0
Remove the duplicate pc10 column. Fixes: be0e54c4ebbf ("turbostat: Build-in "Low Power Idle" counters support") Reported-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Len Brown <len.brown@intel.com>
2019-08-31tools/power x86_energy_perf_policy: Fix argument parsingZephaniah E. Loss-Cutler-Hull1-1/+1
The -w argument in x86_energy_perf_policy currently triggers an unconditional segfault. This is because the argument string reads: "+a:c:dD:E:e:f:m:M:rt:u:vw" and yet the argument handler expects an argument. When parse_optarg_string is called with a null argument, we then proceed to crash in strncmp, not horribly friendly. The man page describes -w as taking an argument, the long form (--hwp-window) is correctly marked as taking a required argument, and the code expects it. As such, this patch simply marks the short form (-w) as requiring an argument. Signed-off-by: Zephaniah E. Loss-Cutler-Hull <zephaniah@gmail.com> Signed-off-by: Len Brown <len.brown@intel.com>
2019-08-31tools/power: Fix typo in man pageMatt Lupfer1-1/+1
From context, we mean EPB (Enegry Performance Bias). Signed-off-by: Matt Lupfer <mlupfer@ddn.com> Signed-off-by: Len Brown <len.brown@intel.com>
2019-08-31tools/power/x86: Enable compiler optimisations and Fortify by defaultBen Hutchings2-2/+4
Compiling without optimisations is silly, especially since some warnings depend on the optimiser. Use -O2. Fortify adds warnings for unchecked I/O (among other things), which seems to be a good idea for user-space code. Enable that too. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Len Brown <len.brown@intel.com>
2019-08-31tools/power x86_energy_perf_policy: Fix "uninitialized variable" warnings at -O2Ben Hutchings1-11/+15
x86_energy_perf_policy first uses __get_cpuid() to check the maximum CPUID level and exits if it is too low. It then assumes that later calls will succeed (which I think is architecturally guaranteed). It also assumes that CPUID works at all (which is not guaranteed on x86_32). If optimisations are enabled, gcc warns about potentially uninitialized variables. Fix this by adding an exit-on-error after every call to __get_cpuid() instead of just checking the maximum level. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Len Brown <len.brown@intel.com>
2019-08-31Merge branch 'i2c/for-current' of ↵Linus Torvalds7-14/+34
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "I2C has a bunch of driver fixes and a core improvement to make the on-going API transition more robust" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: mediatek: disable zero-length transfers for mt8183 i2c: iproc: Stop advertising support of SMBUS quick cmd MAINTAINERS: i2c mv64xxx: Update documentation path i2c: piix4: Fix port selection for AMD Family 16h Model 30h i2c: designware: Synchronize IRQs when unregistering slave client i2c: i801: Avoid memory leak in check_acpi_smo88xx_device() i2c: make i2c_unregister_device() ERR_PTR safe
2019-08-31Merge tag 'trace-v5.3-rc6' of ↵Linus Torvalds6-15/+35
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Small fixes and minor cleanups for tracing: - Make exported ftrace function not static - Fix NULL pointer dereference in reading probes as they are created - Fix NULL pointer dereference in k/uprobe clean up path - Various documentation fixes" * tag 'trace-v5.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Correct kdoc formats ftrace/x86: Remove mcount() declaration tracing/probe: Fix null pointer dereference tracing: Make exported ftrace_set_clr_event non-static ftrace: Check for successful allocation of hash ftrace: Check for empty hash and comment the race with registering probes ftrace: Fix NULL pointer dereference in t_probe_next()
2019-08-31Merge tag 'riscv/for-v5.3-rc7' of ↵Linus Torvalds2-6/+10
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fix from Paul Walmsley: "One significant fix for 32-bit RISC-V systems: Fix the RV32 memory map to prevent userspace from corrupting the FIXMAP area. Without this patch, the system can crash very early during the boot" * tag 'riscv/for-v5.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: RISC-V: Fix FIXMAP area corruption on RV32 systems
2019-08-31Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds6-16/+19
Pull KVM fixes from Radim Krčmář: "PPC: - Fix bug which could leave locks held in the host on return to a guest. x86: - Prevent infinitely looping emulation of a failing syscall while single stepping. - Do not crash the host when nesting is disabled" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: Don't update RIP or do single-step on faulting emulation KVM: x86: hyper-v: don't crash on KVM_GET_SUPPORTED_HV_CPUID when kvm_intel.nested is disabled KVM: PPC: Book3S: Fix incorrect guest-to-user-translation error handling
2019-08-31Merge branch 'akpm' (patches from Andrew)Linus Torvalds6-22/+47
Merge misc mm fixes from Andrew Morton: "7 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm: memcontrol: fix percpu vmstats and vmevents flush mm, memcg: do not set reclaim_state on soft limit reclaim mailmap: add aliases for Dmitry Safonov mm/z3fold.c: fix lock/unlock imbalance in z3fold_page_isolate mm, memcg: partially revert "mm/memcontrol.c: keep local VM counters in sync with the hierarchical ones" mm/zsmalloc.c: fix build when CONFIG_COMPACTION=n mm: memcontrol: flush percpu slab vmstats on kmem offlining
2019-08-31tracing: Correct kdoc formatsJakub Kicinski1-12/+14
Fix the following kdoc warnings: kernel/trace/trace.c:1579: warning: Function parameter or member 'tr' not described in 'update_max_tr_single' kernel/trace/trace.c:1579: warning: Function parameter or member 'tsk' not described in 'update_max_tr_single' kernel/trace/trace.c:1579: warning: Function parameter or member 'cpu' not described in 'update_max_tr_single' kernel/trace/trace.c:1776: warning: Function parameter or member 'type' not described in 'register_tracer' kernel/trace/trace.c:2239: warning: Function parameter or member 'task' not described in 'tracing_record_taskinfo' kernel/trace/trace.c:2239: warning: Function parameter or member 'flags' not described in 'tracing_record_taskinfo' kernel/trace/trace.c:2269: warning: Function parameter or member 'prev' not described in 'tracing_record_taskinfo_sched_switch' kernel/trace/trace.c:2269: warning: Function parameter or member 'next' not described in 'tracing_record_taskinfo_sched_switch' kernel/trace/trace.c:2269: warning: Function parameter or member 'flags' not described in 'tracing_record_taskinfo_sched_switch' kernel/trace/trace.c:3078: warning: Function parameter or member 'ip' not described in 'trace_vbprintk' kernel/trace/trace.c:3078: warning: Function parameter or member 'fmt' not described in 'trace_vbprintk' kernel/trace/trace.c:3078: warning: Function parameter or member 'args' not described in 'trace_vbprintk' Link: http://lkml.kernel.org/r/20190828052549.2472-2-jakub.kicinski@netronome.com Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-08-31ftrace/x86: Remove mcount() declarationJisheng Zhang1-1/+0
Commit 562e14f72292 ("ftrace/x86: Remove mcount support") removed the support for using mcount, so we could remove the mcount() declaration to clean up. Link: http://lkml.kernel.org/r/20190826170150.10f101ba@xhacker.debian Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-08-31tracing/probe: Fix null pointer dereferenceXinpeng Liu1-1/+2
BUG: KASAN: null-ptr-deref in trace_probe_cleanup+0x8d/0xd0 Read of size 8 at addr 0000000000000000 by task syz-executor.0/9746 trace_probe_cleanup+0x8d/0xd0 free_trace_kprobe.part.14+0x15/0x50 alloc_trace_kprobe+0x23e/0x250 Link: http://lkml.kernel.org/r/1565220563-980-1-git-send-email-danielliu861@gmail.com Fixes: e3dc9f898ef9c ("tracing/probe: Add trace_event_call accesses APIs") Signed-off-by: Xinpeng Liu <danielliu861@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-08-31tracing: Make exported ftrace_set_clr_event non-staticDenis Efremov2-1/+2
The function ftrace_set_clr_event is declared static and marked EXPORT_SYMBOL_GPL(), which is at best an odd combination. Because the function was decided to be a part of API, this commit removes the static attribute and adds the declaration to the header. Link: http://lkml.kernel.org/r/20190704172110.27041-1-efremov@linux.com Fixes: f45d1225adb04 ("tracing: Kernel access to Ftrace instances") Reviewed-by: Joe Jin <joe.jin@oracle.com> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-08-31Merge branch 'linus' of ↵Linus Torvalds1-0/+8
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fix from Herbert Xu: "Fix a potential crash in the ccp driver" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: ccp - Ignore unconfigured CCP device on suspend/resume
2019-08-31Partially revert "kfifo: fix kfifo_alloc() and kfifo_init()"Linus Torvalds1-1/+2
Commit dfe2a77fd243 ("kfifo: fix kfifo_alloc() and kfifo_init()") made the kfifo code round the number of elements up. That was good for __kfifo_alloc(), but it's actually wrong for __kfifo_init(). The difference? __kfifo_alloc() will allocate the rounded-up number of elements, but __kfifo_init() uses an allocation done by the caller. We can't just say "use more elements than the caller allocated", and have to round down. The good news? All the normal cases will be using power-of-two arrays anyway, and most users of kfifo's don't use kfifo_init() at all, but one of the helper macros to declare a KFIFO that enforce the proper power-of-two behavior. But it looks like at least ibmvscsis might be affected. The bad news? Will Deacon refers to an old thread and points points out that the memory ordering in kfifo's is questionable. See https://lore.kernel.org/lkml/20181211034032.32338-1-yuleixzhang@tencent.com/ for more. Fixes: dfe2a77fd243 ("kfifo: fix kfifo_alloc() and kfifo_init()") Reported-by: laokz <laokz@foxmail.com> Cc: Stefani Seibold <stefani@seibold.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Greg KH <greg@kroah.com> Cc: Kees Cook <keescook@chromium.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-31mm: memcontrol: fix percpu vmstats and vmevents flushShakeel Butt1-5/+5
Instead of using raw_cpu_read() use per_cpu() to read the actual data of the corresponding cpu otherwise we will be reading the data of the current cpu for the number of online CPUs. Link: http://lkml.kernel.org/r/20190829203110.129263-1-shakeelb@google.com Fixes: bb65f89b7d3d ("mm: memcontrol: flush percpu vmevents before releasing memcg") Fixes: c350a99ea2b1 ("mm: memcontrol: flush percpu vmstats before releasing memcg") Signed-off-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-31mm, memcg: do not set reclaim_state on soft limit reclaimMichal Hocko1-2/+3
Adric Blake has noticed[1] the following warning: WARNING: CPU: 7 PID: 175 at mm/vmscan.c:245 set_task_reclaim_state+0x1e/0x40 [...] Call Trace: mem_cgroup_shrink_node+0x9b/0x1d0 mem_cgroup_soft_limit_reclaim+0x10c/0x3a0 balance_pgdat+0x276/0x540 kswapd+0x200/0x3f0 ? wait_woken+0x80/0x80 kthread+0xfd/0x130 ? balance_pgdat+0x540/0x540 ? kthread_park+0x80/0x80 ret_from_fork+0x35/0x40 ---[ end trace 727343df67b2398a ]--- which tells us that soft limit reclaim is about to overwrite the reclaim_state configured up in the call chain (kswapd in this case but the direct reclaim is equally possible). This means that reclaim stats would get misleading once the soft reclaim returns and another reclaim is done. Fix the warning by dropping set_task_reclaim_state from the soft reclaim which is always called with reclaim_state set up. [1] http://lkml.kernel.org/r/CAE1jjeePxYPvw1mw2B3v803xHVR_BNnz0hQUY_JDMN8ny29M6w@mail.gmail.com Link: http://lkml.kernel.org/r/20190828071808.20410-1-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Adric Blake <promarbler14@gmail.com> Acked-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Yang Shi <yang.shi@linux.alibaba.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Hillf Danton <hdanton@sina.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-31mailmap: add aliases for Dmitry SafonovDmitry Safonov1-0/+3
I don't work for Virtuozzo or Samsung anymore and I've noticed that they have started sending annoying html email-replies. And I prioritize my personal emails over work email box, so while at it add an entry for Arista too - so I can reply faster when needed. Link: http://lkml.kernel.org/r/20190827220346.11123-1-dima@arista.com Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-31mm/z3fold.c: fix lock/unlock imbalance in z3fold_page_isolateGustavo A. R. Silva1-0/+1
Fix lock/unlock imbalance by unlocking *zhdr* before return. Addresses Coverity ID 1452811 ("Missing unlock") Link: http://lkml.kernel.org/r/20190826030634.GA4379@embeddedor Fixes: d776aaa9895e ("mm/z3fold.c: fix race between migration and destruction") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Henry Burns <henrywolfeburns@gmail.com> Cc: Vitaly Wool <vitalywool@gmail.com> Cc: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-31mm, memcg: partially revert "mm/memcontrol.c: keep local VM counters in sync ↵Roman Gushchin1-5/+3
with the hierarchical ones" Commit 766a4c19d880 ("mm/memcontrol.c: keep local VM counters in sync with the hierarchical ones") effectively decreased the precision of per-memcg vmstats_local and per-memcg-per-node lruvec percpu counters. That's good for displaying in memory.stat, but brings a serious regression into the reclaim process. One issue I've discovered and debugged is the following: lruvec_lru_size() can return 0 instead of the actual number of pages in the lru list, preventing the kernel to reclaim last remaining pages. Result is yet another dying memory cgroups flooding. The opposite is also happening: scanning an empty lru list is the waste of cpu time. Also, inactive_list_is_low() can return incorrect values, preventing the active lru from being scanned and freed. It can fail both because the size of active and inactive lists are inaccurate, and because the number of workingset refaults isn't precise. In other words, the result is pretty random. I'm not sure, if using the approximate number of slab pages in count_shadow_number() is acceptable, but issues described above are enough to partially revert the patch. Let's keep per-memcg vmstat_local batched (they are only used for displaying stats to the userspace), but keep lruvec stats precise. This change fixes the dead memcg flooding on my setup. Link: http://lkml.kernel.org/r/20190817004726.2530670-1-guro@fb.com Fixes: 766a4c19d880 ("mm/memcontrol.c: keep local VM counters in sync with the hierarchical ones") Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Yafang Shao <laoar.shao@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-31mm/zsmalloc.c: fix build when CONFIG_COMPACTION=nAndrew Morton1-0/+2
Fixes: 701d678599d0c1 ("mm/zsmalloc.c: fix race condition in zs_destroy_pool") Link: http://lkml.kernel.org/r/201908251039.5oSbEEUT%25lkp@intel.com Reported-by: kbuild test robot <lkp@intel.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Henry Burns <henrywolfeburns@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Jonathan Adams <jwadams@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-31mm: memcontrol: flush percpu slab vmstats on kmem offliningRoman Gushchin2-10/+30
I've noticed that the "slab" value in memory.stat is sometimes 0, even if some children memory cgroups have a non-zero "slab" value. The following investigation showed that this is the result of the kmem_cache reparenting in combination with the per-cpu batching of slab vmstats. At the offlining some vmstat value may leave in the percpu cache, not being propagated upwards by the cgroup hierarchy. It means that stats on ancestor levels are lower than actual. Later when slab pages are released, the precise number of pages is substracted on the parent level, making the value negative. We don't show negative values, 0 is printed instead. To fix this issue, let's flush percpu slab memcg and lruvec stats on memcg offlining. This guarantees that numbers on all ancestor levels are accurate and match the actual number of outstanding slab pages. Link: http://lkml.kernel.org/r/20190819202338.363363-3-guro@fb.com Fixes: fb2f2b0adb98 ("mm: memcg/slab: reparent memcg kmem_caches on cgroup removal") Signed-off-by: Roman Gushchin <guro@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-30ftrace: Check for successful allocation of hashNaveen N. Rao1-0/+5
In register_ftrace_function_probe(), we are not checking the return value of alloc_and_copy_ftrace_hash(). The subsequent call to ftrace_match_records() may end up dereferencing the same. Add a check to ensure this doesn't happen. Link: http://lkml.kernel.org/r/26e92574f25ad23e7cafa3cf5f7a819de1832cbe.1562249521.git.naveen.n.rao@linux.vnet.ibm.com Cc: stable@vger.kernel.org Fixes: 1ec3a81a0cf42 ("ftrace: Have each function probe use its own ftrace_ops") Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>