starfive-tech/linux.git - StarFive Tech Linux Kernel for VisionFive (JH7110) boards (mirror)

Age	Commit message (Collapse)	Author	Files	Lines
2017-01-31	perf ftrace: Add ftrace.tracer config option	Taeung Song	1	-0/+25
	Currently 'perf ftrace' command allows selecting 'function_graph' or 'function', defaulting to 'function_graph'. Add ftrace.tracer config option to select the default tracer: # cat ~/.perfconfig [ftrace] tracer = function # perf ftrace usleep 123456 \| head -10 <...>-14450 [002] d... 10089.284231: finish_task_switch <-__schedule <...>-14450 [002] .... 10089.284232: finish_wait <-pipe_wait <...>-14450 [002] .... 10089.284232: mutex_lock <-pipe_wait <...>-14450 [002] .... 10089.284232: _cond_resched <-mutex_lock Committer notes: Retesting it with invalid variables, invalid values for ftrace.tracer, and a valid one: # cat ~/.perfconfig [ftrace] trace = function # perf ftrace usleep 1 Error: wrong config key-value pair ftrace.trace=function # cat ~/.perfconfig [ftrace] tracer = functin # perf ftrace usleep 1 Please select "function_graph" (default) or "function" Error: wrong config key-value pair ftrace.tracer=functin # cat ~/.perfconfig [ftrace] tracer = function # perf ftrace usleep 1 \| head -5 <idle>-0 [000] d... 3855.820847: switch_mm_irqs_off <-__schedule <...>-18550 [000] d... 3855.820849: finish_task_switch <-__schedule <...>-18550 [000] d... 3855.820851: smp_irq_work_interrupt <-irq_work_interrupt <...>-18550 [000] d... 3855.820851: irq_enter <-smp_irq_work_interrupt <...>-18550 [000] d... 3855.820851: rcu_irq_enter <-irq_enter # Signed-off-by: Taeung Song <treeze.taeung@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1485862711-20216-3-git-send-email-treeze.taeung@gmail.com [ Added missign space in error message, changed the logic to make it more compact and less error prone ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-31	perf tools: Create for_each_event macro for tracepoints iteration	Taeung Song	1	-20/+18
	Similar to for_each_subsystem and for_each_event in util/parse-events.c, add new macro 'for_each_event' for easy iteration over the tracepoints in order to be more compact and readable. Signed-off-by: Taeung Song <treeze.taeung@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1485862711-20216-2-git-send-email-treeze.taeung@gmail.com [ Slight change to keep existing style for checking strcmp() return ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-31	perf test: Add libbpf pinning test	Joe Stringer	1	-1/+41
	Add a test for the newly added BPF object pinning functionality. For example: # tools/perf/perf test 37 37: BPF filter : 37.1: Basic BPF filtering : Ok 37.2: BPF pinning : Ok 37.3: BPF prologue generation : Ok 37.4: BPF relocation checker : Ok # tools/perf/perf test 37 -v 2>&1 \| grep pinned libbpf: pinned map '/sys/fs/bpf/perf_test/flip_table' libbpf: pinned program '/sys/fs/bpf/perf_test/func=SyS_epoll_wait/0' Signed-off-by: Joe Stringer <joe@ovn.org> Requested-and-Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexei Starovoitov <ast@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Wang Nan <wangnan0@huawei.com> Cc: netdev@vger.kernel.org Link: http://lkml.kernel.org/r/20170126212001.14103-7-joe@ovn.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-31	tools lib api fs: Add bpf_fs filesystem detector	Joe Stringer	2	-0/+17
	Allow mounting of the BPF filesystem at /sys/fs/bpf. Signed-off-by: Joe Stringer <joe@ovn.org> Cc: Alexei Starovoitov <ast@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Wang Nan <wangnan0@huawei.com> Cc: netdev@vger.kernel.org Link: http://lkml.kernel.org/r/20170126212001.14103-6-joe@ovn.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-31	tools perf util: Make rm_rf(path) argument const	Joe Stringer	2	-2/+2
	rm_rf() doesn't modify its path argument, and a future caller will pass a string constant into it to delete. Signed-off-by: Joe Stringer <joe@ovn.org> Cc: Alexei Starovoitov <ast@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Wang Nan <wangnan0@huawei.com> Cc: netdev@vger.kernel.org Link: http://lkml.kernel.org/r/20170126212001.14103-5-joe@ovn.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-31	tools lib bpf: Add bpf_object__pin()	Joe Stringer	2	-0/+54
	Add a new API to pin a BPF object to the filesystem. The user can specify the path within a BPF filesystem to pin the object. Programs will be pinned under a subdirectory named the same as the program, with each instance appearing as a numbered file under that directory, and maps will be pinned under the path using the name of the map as the file basename. For example, with the directory '/sys/fs/bpf/foo' and a BPF object which contains two instances of a program named 'bar', and a map named 'baz': /sys/fs/bpf/foo/bar/0 /sys/fs/bpf/foo/bar/1 /sys/fs/bpf/foo/baz Signed-off-by: Joe Stringer <joe@ovn.org> Cc: Alexei Starovoitov <ast@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Wang Nan <wangnan0@huawei.com> Cc: netdev@vger.kernel.org Link: http://lkml.kernel.org/r/20170126212001.14103-4-joe@ovn.org [ Check snprintf >= for truncation, as snprintf(bf, size, ...) == size also means truncation ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-31	tools lib bpf: Add bpf_map__pin()	Joe Stringer	2	-0/+23
	Add a new API to pin a BPF map to the filesystem. The user can specify the path full path within a BPF filesystem to pin the map. Signed-off-by: Joe Stringer <joe@ovn.org> Cc: Alexei Starovoitov <ast@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Wang Nan <wangnan0@huawei.com> Cc: netdev@vger.kernel.org Link: http://lkml.kernel.org/r/20170126212001.14103-3-joe@ovn.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-31	tools lib bpf: Add BPF program pinning APIs	Joe Stringer	2	-0/+123
	Add new APIs to pin a BPF program (or specific instances) to the filesystem. The user can specify the path full path within a BPF filesystem to pin the program. bpf_program__pin_instance(prog, path, n) will pin the nth instance of 'prog' to the specified path. bpf_program__pin(prog, path) will create the directory 'path' (if it does not exist) and pin each instance within that directory. For instance, path/0, path/1, path/2. Committer notes: - Add missing headers for mkdir() - Check strdup() for failure - Check snprintf >= size, not >, as == also means truncated, see 'man snprintf', return value. - Conditionally define BPF_FS_MAGIC, as it isn't in magic.h in older systems and we're not yet having a tools/include/uapi/linux/magic.h copy. - Do not include linux/magic.h, not present in older distros. Signed-off-by: Joe Stringer <joe@ovn.org> Cc: Alexei Starovoitov <ast@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Wang Nan <wangnan0@huawei.com> Cc: netdev@vger.kernel.org Link: http://lkml.kernel.org/r/20170126212001.14103-2-joe@ovn.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-31	perf callchain: Reference count maps	Krister Johansen	3	-2/+22
	If dso__load_kcore frees all of the existing maps, but one has already been attached to a callchain cursor node, then we can get a SIGSEGV in any function that happens to try to use this invalid cursor. Use the existing map refcount mechanism to forestall cleanup of a map until the cursor iterates past the node. Signed-off-by: Krister Johansen <kjlx@templeofstupid.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: stable@kernel.org Fixes: 84c2cafa2889 ("perf tools: Reference count struct map") Link: http://lkml.kernel.org/r/20170106062331.GB2707@templeofstupid.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-30	tools headers: Sync {tools/,}arch/powerpc/include/uapi/asm/kvm.h, ↵	Ingo Molnar	3	-0/+25
	{tools/,}arch/x86/include/asm/cpufeatures.h and {tools/,}arch/arm/include/uapi/asm/kvm.h The following upstream headers were updated: - The x86 cpufeatures.h file picked up a couple of new feature entries - The PowerPC and ARM KVM headers picked up new features None of which requires changes to perf tooling, so refresh the tooling copy. Solves these build time warnings: Warning: arch/x86/include/asm/cpufeatures.h differs from kernel Warning: arch/powerpc/include/uapi/asm/kvm.h differs from kernel Warning: arch/arm/include/uapi/asm/kvm.h differs from kernel Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20170130081131.GA8322@gmail.com [ resync tools/arch/x86/include/asm/cpufeatures.h ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-27	perf tools: Propagate perf_config() errors	Arnaldo Carvalho de Melo	12	-23/+62
	Previously these were being ignored, sometimes silently. Stop doing that, emitting debug messages and handling the errors. Testing it: $ cat ~/.perfconfig cat: /home/acme/.perfconfig: No such file or directory $ perf stat -e cycles usleep 1 Performance counter stats for 'usleep 1': 938,996 cycles:u 0.003813731 seconds time elapsed $ perf top --stdio Error: You may not have permission to collect system-wide stats. Consider tweaking /proc/sys/kernel/perf_event_paranoid, <SNIP> [ perf record: Captured and wrote 0.019 MB perf.data (7 samples) ] [acme@jouet linux]$ perf report --stdio # To display the perf.data header info, please use --header/--header-only options. # Overhead Command Shared Object Symbol # ........ ....... ................. ......................... 71.77% usleep libc-2.24.so [.] _dl_addr 27.07% usleep ld-2.24.so [.] _dl_next_ld_env_entry 1.13% usleep [kernel.kallsyms] [k] page_fault $ $ touch ~/.perfconfig $ ls -la ~/.perfconfig -rw-rw-r--. 1 acme acme 0 Jan 27 12:14 /home/acme/.perfconfig $ $ perf stat -e instructions usleep 1 Performance counter stats for 'usleep 1': 244,610 instructions:u 0.000805383 seconds time elapsed $ [root@jouet ~]# chown acme.acme ~/.perfconfig [root@jouet ~]# perf stat -e cycles usleep 1 Warning: File /root/.perfconfig not owned by current user or root, ignoring it. Performance counter stats for 'usleep 1': 937,615 cycles 0.000836931 seconds time elapsed # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-j2rq96so6xdqlr8p8rd6a3jx@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-27	perf config: Do not consider an error not to have any perfconfig file	Arnaldo Carvalho de Melo	1	-6/+8
	While propagating the errors from perf_config(), which were being completely ignored, everything stopped working for people without a ~/.perfconfig file, because the perf_config_set__init() was considering an error not to have a .perfconfig file, duh, fix it by checking the errno after the failed stat() call. It should also not return an error when it says it is ignoring the file, and also a empty file should not return an error either. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Taeung Song <treeze.taeung@gmail.com> Cc: Wang Nan <wangnan0@huawei.com> Fixes: 8beeb00f2c84 ("perf config: Use new perf_config_set__init() to initialize config set") Link: http://lkml.kernel.org/n/tip-ygpbab3apbs6l8wr97xedwks@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-26	tools build: Add tools tree support for 'make -s'	Josh Poimboeuf	3	-3/+25
	When doing a kernel build with 'make -s', everything is silenced except the objtool build. That's because the tools tree support for silent builds is some combination of missing and broken. Three changes are needed to fix it: - Makefile: propagate '-s' to the sub-make's MAKEFLAGS variable so the tools Makefiles can see it. - tools/scripts/Makefile.include: fix the tools Makefiles' ability to recognize '-s'. The MAKE_VERSION and MAKEFLAGS checks are copied from the top-level Makefile. This silences the "DESCEND objtool" message. - tools/build/Makefile.build: add support to the tools Build files for recognizing '-s'. Again the MAKE_VERSION and MAKEFLAGS checks are copied from the top-level Makefile. This silences all the object compile/link messages. Reported-and-Tested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Michal Marek <mmarek@suse.com> Link: http://lkml.kernel.org/r/e8967562ef640c3ae9a76da4ae0f4e47df737c34.1484799200.git.jpoimboe@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-26	perf ftrace: Remove needless code setting default tracer	Taeung Song	1	-4/+1
	As a result of commit a3497642c261 ("perf ftrace: Make 'function_graph' be the default tracer") the ftrace.tracer variable can't be NULL but the other code setting default tracer remained. Signed-off-by: Taeung Song <treeze.taeung@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1485423339-22780-1-git-send-email-treeze.taeung@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-26	Merge tag 'perf-core-for-mingo-4.11-20170126' of ↵	Ingo Molnar	22	-96/+600
	git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull the latest perf/core updates from Arnaldo Carvalho de Melo: New features: - Introduce 'perf ftrace' a perf front end to the kernel's ftrace function and function_graph tracer, defaulting to the "function_graph" tracer, more work will be done in reviving this effort, forward porting it from its initial patch submission (Namhyung Kim) - Add 'e' and 'c' hotkeys to expand/collapse call chains for a single hist entry in the 'perf report' and 'perf top' TUI (Jiri Olsa) Fixes: - Fix wrong register name for arm64, used in 'perf probe' (He Kuang) - Fix map offsets in relocation in libbpf (Joe Stringer) - Fix looking up dwarf unwind stack info (Matija Glavinic Pecotic) Infrastructure changes: - libbpf prog functions sync with what is exported via uapi (Joe Stringer) Trivial changes: - Remove unnecessary checks and assignments in 'perf probe's try_to_find_absolute_address() (Markus Elfring) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-26	perf ftrace: Make 'function_graph' be the default tracer	Arnaldo Carvalho de Melo	1	-1/+2
	So that we can suppress the '-t function_graph' and get a more compact command line: # perf ftrace usleep 123456 \| grep raw_spin_lock \| sort -k2 -nr \| head -5 2) 0.555 us \| _raw_spin_lock(); 2) 0.516 us \| _raw_spin_lock(); 2) 0.410 us \| _raw_spin_lock_irq(); 2) 0.374 us \| _raw_spin_lock_irqsave(); # Tested-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jeremy Eder <jeder@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-ss9xgx5htpxcv86x42pnh3m6@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-26	perf ftrace: Introduce new 'ftrace' tool	Namhyung Kim	6	-0/+282
	The 'perf ftrace' command is a simple wrapper of kernel's ftrace functionality. It only supports single thread tracing currently and just reads trace_pipe in text and then write it to stdout. Committer notes: Testing it: # perf ftrace -f function_graph usleep 123456 <SNIP> 2) \| SyS_nanosleep() { 2) \| _copy_from_user() { <SNIP> 2) 0.900 us \| } 2) 1.354 us \| } 2) \| hrtimer_nanosleep() { 2) 0.062 us \| __hrtimer_init(); 2) \| do_nanosleep() { 2) \| hrtimer_start_range_ns() { <SNIP> 2) 5.025 us \| } 2) \| schedule() { 2) 0.125 us \| rcu_note_context_switch(); 2) 0.057 us \| _raw_spin_lock(); 2) \| deactivate_task() { 2) 0.369 us \| update_rq_clock.part.77(); 2) \| dequeue_task_fair() { <SNIP> 2) + 22.453 us \| } 2) + 23.736 us \| } 2) \| pick_next_task_fair() { <SNIP> 2) + 47.167 us \| } 2) \| pick_next_task_idle() { <SNIP> 2) 4.462 us \| } ------------------------------------------ 2) usleep-20387 => <idle>-0 ------------------------------------------ 2) 0.806 us \| switch_mm_irqs_off(); ------------------------------------------ 2) <idle>-0 => usleep-20387 ------------------------------------------ 2) 0.151 us \| finish_task_switch(); 2) @ 123597.2 us \| } 2) 0.037 us \| _cond_resched(); 2) \| hrtimer_try_to_cancel() { 2) 0.064 us \| hrtimer_active(); 2) 0.353 us \| } 2) @ 123605.3 us \| } 2) @ 123606.2 us \| } 2) @ 123608.3 us \| } /* SyS_nanosleep */ 2) \| __do_page_fault() { <SNIP> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jeremy Eder <jeder@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com>, Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/n/tip-r1hgmsj4dxny8arn3o9mw512@git.kernel.org [ Various foward port fixes, add man page ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-26	perf util: Add more debug message on failure path	Namhyung Kim	2	-18/+39
	It's helpful for debugging on tracing features. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jeremy Eder <jeder@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com>, Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/n/tip-rjysr9ljiesymgk4qblteaty@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-26	perf util: Save pid-cmdline mapping into tracing header	Namhyung Kim	4	-3/+84
	Current trace info data lacks the saved cmdline mapping which is needed for pevent to find out the comm of a task. Add this and bump up the version number so that perf can determine its presence when reading. This is mostly corresponding to trace.dat file version 6, but still lacks 4 byte of number of cpus, and 10 bytes of type string - and I think we don't need those anyway. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jeremy Eder <jeder@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com>, Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> [ Change version test from == to >= ] Link: http://lkml.kernel.org/n/tip-vaooqpxsikxbb3359p0corcb@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-26	perf scripting perl: Do not die() when not founding event for a type	Arnaldo Carvalho de Melo	1	-2/+4
	Do just like handling other cases i.e. print some debug message and ignore the sample. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-t7kzlm3cxyvbd7d9n9554ai9@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-26	tools lib bpf: Add libbpf_get_error()	Joe Stringer	3	-2/+12
	This function will turn a libbpf pointer into a standard error code (or 0 if the pointer is valid). This also allows removal of the dependency on linux/err.h in the public header file, which causes problems in userspace programs built against libbpf. Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Wang Nan <wangnan0@huawei.com> Cc: Alexei Starovoitov <ast@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: netdev@vger.kernel.org Link: http://lkml.kernel.org/r/20170123011128.26534-5-joe@ovn.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-26	tools lib bpf: Add set/is helpers for all prog types	Joe Stringer	2	-0/+15
	These bpf_prog_types were exposed in the uapi but there were no corresponding functions to set these types for programs in libbpf. Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Wang Nan <wangnan0@huawei.com> Cc: Alexei Starovoitov <ast@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: netdev@vger.kernel.org Link: http://lkml.kernel.org/r/20170123011128.26534-4-joe@ovn.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-26	tools lib bpf: Define prog_type fns with macro	Joe Stringer	1	-25/+16
	Turning this into a macro allows future prog types to be added with a single line per type. Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Wang Nan <wangnan0@huawei.com> Cc: Alexei Starovoitov <ast@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: netdev@vger.kernel.org Link: http://lkml.kernel.org/r/20170123011128.26534-3-joe@ovn.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-26	tools lib bpf: Fix map offsets in relocation	Joe Stringer	1	-3/+12
	Commit 4708bbda5cb2 ("tools lib bpf: Fix maps resolution") attempted to fix map resolution by identifying the number of symbols that point to maps, and using this number to resolve each of the maps. However, during relocation the original definition of the map size was still in use. For up to two maps, the calculation was correct if there was a small difference in size between the map definition in libbpf and the one that the client library uses. However if the difference was large, particularly if more than two maps were used in the BPF program, the relocation would fail. For example, when using a map definition with size 28, with three maps, map relocation would count: (sym_offset / sizeof(struct bpf_map_def) => map_idx) (0 / 16 => 0), ie map_idx = 0 (28 / 16 => 1), ie map_idx = 1 (56 / 16 => 3), ie map_idx = 3 So, libbpf reports: libbpf: bpf relocation: map_idx 3 large than 2 Fix map relocation by checking the exact offset of maps when doing relocation. Signed-off-by: Joe Stringer <joe@ovn.org> [Allow different map size in an object] Signed-off-by: Wang Nan <wangnan0@huawei.com> Cc: Alexei Starovoitov <ast@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: netdev@vger.kernel.org Fixes: 4708bbda5cb2 ("tools lib bpf: Fix maps resolution") Link: http://lkml.kernel.org/r/20170123011128.26534-2-joe@ovn.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-26	perf probe: Delete an unnecessary assignment in try_to_find_absolute_address()	Markus Elfring	1	-3/+2
	Remove an error code assignment which is redundant in an if branch for the handling of a memory allocation failure because the same value was set for the local variable "err" before. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: He Kuang <hekuang@huawei.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Wang Nan <wangnan0@huawei.com> Cc: kernel-janitors@vger.kernel.org Link: http://lkml.kernel.org/r/0ede09ec-79b6-c8bd-5b20-02c63ed98aab@users.sourceforge.net Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-26	perf probe: Delete an unnecessary check in try_to_find_absolute_address()	Markus Elfring	1	-4/+2
	Remove a condition check which is unnecessary at the end because this source code place should usually only be reached with a non-zero pointer. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: He Kuang <hekuang@huawei.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Wang Nan <wangnan0@huawei.com> Cc: kernel-janitors@vger.kernel.org Link: http://lkml.kernel.org/r/a3f2473b-6383-a326-bce0-b826423608b8@users.sourceforge.net Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-26	perf probe: Fix wrong register name for arm64	He Kuang	1	-6/+6
	The register name of arm64 architecture is x0-x31 not r0-r31, this patch changes this typo. Before this patch: # perf probe --definition 'sys_write count' p:probe/sys_write _text+1502872 count=%r2:s64 # echo 'p:probe/sys_write _text+1502872 count=%r2:s64' > \ /sys/kernel/debug/tracing/kprobe_events Parse error at argument[0]. (-22) After this patch: # perf probe --definition 'sys_write count' p:probe/sys_write _text+1502872 count=%x2:s64 # echo 'p:probe/sys_write _text+1502872 count=%x2:s64' > \ /sys/kernel/debug/tracing/kprobe_events # echo 1 >/sys/kernel/debug/tracing/events/probe/enable # cat /sys/kernel/debug/tracing/trace ... sh-422 [000] d... 650.495930: sys_write: (SyS_write+0x0/0xc8) count=22 sh-422 [000] d... 651.102389: sys_write: (SyS_write+0x0/0xc8) count=26 sh-422 [000] d... 651.358653: sys_write: (SyS_write+0x0/0xc8) count=86 Signed-off-by: He Kuang <hekuang@huawei.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Bintian Wang <bintian.wang@huawei.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/20170124103015.1936-2-hekuang@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-25	Merge branch 'linus' into perf/core, to pick up fixes	Ingo Molnar	860	-5016/+8228
	Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-25	Merge branch 'akpm' (patches from Andrew)	Linus Torvalds	28	-78/+237
	Merge fixes from Andrew Morton: "26 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (26 commits) MAINTAINERS: add Dan Streetman to zbud maintainers MAINTAINERS: add Dan Streetman to zswap maintainers mm: do not export ioremap_page_range symbol for external module mn10300: fix build error of missing fpu_save() romfs: use different way to generate fsid for BLOCK or MTD frv: add missing atomic64 operations mm, page_alloc: fix premature OOM when racing with cpuset mems update mm, page_alloc: move cpuset seqcount checking to slowpath mm, page_alloc: fix fast-path race with cpuset update or removal mm, page_alloc: fix check for NULL preferred_zone kernel/panic.c: add missing \n fbdev: color map copying bounds checking frv: add atomic64_add_unless() mm/mempolicy.c: do not put mempolicy before using its nodemask radix-tree: fix private list warnings Documentation/filesystems/proc.txt: add VmPin mm, memcg: do not retry precharge charges proc: add a schedule point in proc_pid_readdir() mm: alloc_contig: re-allow CMA to compact FS pages mm/slub.c: trace free objects at KERN_INFO ...
2017-01-25	MAINTAINERS: add Dan Streetman to zbud maintainers	Dan Streetman	1	-0/+1
	Add myself as zbud maintainer. Link: http://lkml.kernel.org/r/20170124221705.26523-1-ddstreet@ieee.org Signed-off-by: Dan Streetman <ddstreet@ieee.org> Cc: Seth Jennings <sjenning@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-25	MAINTAINERS: add Dan Streetman to zswap maintainers	Dan Streetman	1	-0/+1
	Add myself as zswap maintainer. Link: http://lkml.kernel.org/r/20170124212200.19052-1-ddstreet@ieee.org Signed-off-by: Dan Streetman <ddstreet@ieee.org> Acked-by: Seth Jennings <sjenning@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-25	mm: do not export ioremap_page_range symbol for external module	zhong jiang	1	-1/+0
	Recently, I've found cases in which ioremap_page_range was used incorrectly, in external modules, leading to crashes. This can be partly attributed to the fact that ioremap_page_range is lower-level, with fewer protections, as compared to the other functions that an external module would typically call. Those include: ioremap_cache ioremap_nocache ioremap_prot ioremap_uc ioremap_wc ioremap_wt ...each of which wraps __ioremap_caller, which in turn provides a safer way to achieve the mapping. Therefore, stop EXPORT-ing ioremap_page_range. Link: http://lkml.kernel.org/r/1485173220-29010-1-git-send-email-zhongjiang@huawei.com Signed-off-by: zhong jiang <zhongjiang@huawei.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Suggested-by: John Hubbard <jhubbard@nvidia.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-25	mn10300: fix build error of missing fpu_save()	Randy Dunlap	1	-1/+1
	When CONFIG_FPU is not enabled on arch/mn10300, <asm/switch_to.h> causes a build error with a call to fpu_save(): kernel/built-in.o: In function `.L410': core.c:(.sched.text+0x28a): undefined reference to `fpu_save' Fix this by including <asm/fpu.h> in <asm/switch_to.h> so that an empty static inline fpu_save() is defined. Link: http://lkml.kernel.org/r/dc421c4f-4842-4429-1b99-92865c2f24b6@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kbuild test robot <fengguang.wu@intel.com> Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-25	romfs: use different way to generate fsid for BLOCK or MTD	Coly Li	1	-1/+22
	Commit 8a59f5d25265 ("fs/romfs: return f_fsid for statfs(2)") generates a 64bit id from sb->s_bdev->bd_dev. This is only correct when romfs is defined with CONFIG_ROMFS_ON_BLOCK. If romfs is only defined with CONFIG_ROMFS_ON_MTD, sb->s_bdev is NULL, referencing sb->s_bdev->bd_dev will triger an oops. Richard Weinberger points out that when CONFIG_ROMFS_BACKED_BY_BOTH=y, both CONFIG_ROMFS_ON_BLOCK and CONFIG_ROMFS_ON_MTD are defined. Therefore when calling huge_encode_dev() to generate a 64bit id, I use the follow order to choose parameter, - CONFIG_ROMFS_ON_BLOCK defined use sb->s_bdev->bd_dev - CONFIG_ROMFS_ON_BLOCK undefined and CONFIG_ROMFS_ON_MTD defined use sb->s_dev when, - both CONFIG_ROMFS_ON_BLOCK and CONFIG_ROMFS_ON_MTD undefined leave id as 0 When CONFIG_ROMFS_ON_MTD is defined and sb->s_mtd is not NULL, sb->s_dev is set to a device ID generated by MTD_BLOCK_MAJOR and mtd index, otherwise sb->s_dev is 0. This is a try-best effort to generate a uniq file system ID, if all the above conditions are not meet, f_fsid of this romfs instance will be 0. Generally only one romfs can be built on single MTD block device, this method is enough to identify multiple romfs instances in a computer. Link: http://lkml.kernel.org/r/1482928596-115155-1-git-send-email-colyli@suse.de Signed-off-by: Coly Li <colyli@suse.de> Reported-by: Nong Li <nongli1031@gmail.com> Tested-by: Nong Li <nongli1031@gmail.com> Cc: Richard Weinberger <richard.weinberger@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-25	frv: add missing atomic64 operations	Sudip Mukherjee	1	-1/+18
	Some more atomic64 operations were missing and as a result frv allmodconfig was failing. Add the missing operations. Link: http://lkml.kernel.org/r/1485193844-12850-1-git-send-email-sudip.mukherjee@codethink.co.uk Signed-off-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-25	mm, page_alloc: fix premature OOM when racing with cpuset mems update	Vlastimil Babka	1	-11/+24
	Ganapatrao Kulkarni reported that the LTP test cpuset01 in stress mode triggers OOM killer in few seconds, despite lots of free memory. The test attempts to repeatedly fault in memory in one process in a cpuset, while changing allowed nodes of the cpuset between 0 and 1 in another process. The problem comes from insufficient protection against cpuset changes, which can cause get_page_from_freelist() to consider all zones as non-eligible due to nodemask and/or current->mems_allowed. This was masked in the past by sufficient retries, but since commit 682a3385e773 ("mm, page_alloc: inline the fast path of the zonelist iterator") we fix the preferred_zoneref once, and don't iterate over the whole zonelist in further attempts, thus the only eligible zones might be placed in the zonelist before our starting point and we always miss them. A previous patch fixed this problem for current->mems_allowed. However, cpuset changes also update the task's mempolicy nodemask. The fix has two parts. We have to repeat the preferred_zoneref search when we detect cpuset update by way of seqcount, and we have to check the seqcount before considering OOM. [akpm@linux-foundation.org: fix typo in comment] Link: http://lkml.kernel.org/r/20170120103843.24587-5-vbabka@suse.cz Fixes: c33d6c06f60f ("mm, page_alloc: avoid looking up the first zone in a zonelist twice") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Ganapatrao Kulkarni <gpkulkarni@gmail.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-25	mm, page_alloc: move cpuset seqcount checking to slowpath	Vlastimil Babka	1	-21/+26
	This is a preparation for the following patch to make review simpler. While the primary motivation is a bug fix, this also simplifies the fast path, although the moved code is only enabled when cpusets are in use. Link: http://lkml.kernel.org/r/20170120103843.24587-4-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Ganapatrao Kulkarni <gpkulkarni@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-25	mm, page_alloc: fix fast-path race with cpuset update or removal	Vlastimil Babka	1	-1/+9
	Ganapatrao Kulkarni reported that the LTP test cpuset01 in stress mode triggers OOM killer in few seconds, despite lots of free memory. The test attempts to repeatedly fault in memory in one process in a cpuset, while changing allowed nodes of the cpuset between 0 and 1 in another process. One possible cause is that in the fast path we find the preferred zoneref according to current mems_allowed, so that it points to the middle of the zonelist, skipping e.g. zones of node 1 completely. If the mems_allowed is updated to contain only node 1, we never reach it in the zonelist, and trigger OOM before checking the cpuset_mems_cookie. This patch fixes the particular case by redoing the preferred zoneref search if we switch back to the original nodemask. The condition is also slightly changed so that when the last non-root cpuset is removed, we don't miss it. Note that this is not a full fix, and more patches will follow. Link: http://lkml.kernel.org/r/20170120103843.24587-3-vbabka@suse.cz Fixes: 682a3385e773 ("mm, page_alloc: inline the fast path of the zonelist iterator") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Ganapatrao Kulkarni <gpkulkarni@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-25	mm, page_alloc: fix check for NULL preferred_zone	Vlastimil Babka	2	-2/+6
	Patch series "fix premature OOM regression in 4.7+ due to cpuset races". This is v2 of my attempt to fix the recent report based on LTP cpuset stress test [1]. The intention is to go to stable 4.9 LTSS with this, as triggering repeated OOMs is not nice. That's why the patches try to be not too intrusive. Unfortunately why investigating I found that modifying the testcase to use per-VMA policies instead of per-task policies will bring the OOM's back, but that seems to be much older and harder to fix problem. I have posted a RFC [2] but I believe that fixing the recent regressions has a higher priority. Longer-term we might try to think how to fix the cpuset mess in a better and less error prone way. I was for example very surprised to learn, that cpuset updates change not only task->mems_allowed, but also nodemask of mempolicies. Until now I expected the parameter to alloc_pages_nodemask() to be stable. I wonder why do we then treat cpusets specially in get_page_from_freelist() and distinguish HARDWALL etc, when there's unconditional intersection between mempolicy and cpuset. I would expect the nodemask adjustment for saving overhead in g_p_f(), but that clearly doesn't happen in the current form. So we have both crazy complexity and overhead, AFAICS. [1] https://lkml.kernel.org/r/CAFpQJXUq-JuEP=QPidy4p_=FN0rkH5Z-kfB4qBvsf6jMS87Edg@mail.gmail.com [2] https://lkml.kernel.org/r/7c459f26-13a6-a817-e508-b65b903a8378@suse.cz This patch (of 4): Since commit c33d6c06f60f ("mm, page_alloc: avoid looking up the first zone in a zonelist twice") we have a wrong check for NULL preferred_zone, which can theoretically happen due to concurrent cpuset modification. We check the zoneref pointer which is never NULL and we should check the zone pointer. Also document this in first_zones_zonelist() comment per Michal Hocko. Fixes: c33d6c06f60f ("mm, page_alloc: avoid looking up the first zone in a zonelist twice") Link: http://lkml.kernel.org/r/20170120103843.24587-2-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Ganapatrao Kulkarni <gpkulkarni@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-25	kernel/panic.c: add missing \n	Jiri Slaby	1	-1/+1
	When a system panics, the "Rebooting in X seconds.." message is never printed because it lacks a new line. Fix it. Link: http://lkml.kernel.org/r/20170119114751.2724-1-jslaby@suse.cz Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-25	fbdev: color map copying bounds checking	Kees Cook	1	-12/+14
	Copying color maps to userspace doesn't check the value of to->start, which will cause kernel heap buffer OOB read due to signedness wraps. CVE-2016-8405 Link: http://lkml.kernel.org/r/20170105224249.GA50925@beast Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kees Cook <keescook@chromium.org> Reported-by: Peter Pi (@heisecode) of Trend Micro Cc: Min Chong <mchong@google.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-25	frv: add atomic64_add_unless()	Sudip Mukherjee	1	-0/+16
	The build of frv allmodconfig was failing with the error: lib/atomic64_test.c:209:9: error: implicit declaration of function 'atomic64_add_unless' All the atomic64 operations were defined in frv, but atomic64_add_unless() was not done. Implement atomic64_add_unless() as done in other arches. Link: http://lkml.kernel.org/r/1484781236-6698-1-git-send-email-sudipm.mukherjee@gmail.com Signed-off-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-25	mm/mempolicy.c: do not put mempolicy before using its nodemask	Vlastimil Babka	1	-1/+1
	Since commit be97a41b291e ("mm/mempolicy.c: merge alloc_hugepage_vma to alloc_pages_vma") alloc_pages_vma() can potentially free a mempolicy by mpol_cond_put() before accessing the embedded nodemask by __alloc_pages_nodemask(). The commit log says it's so "we can use a single exit path within the function" but that's clearly wrong. We can still do that when doing mpol_cond_put() after the allocation attempt. Make sure the mempolicy is not freed prematurely, otherwise __alloc_pages_nodemask() can end up using a bogus nodemask, which could lead e.g. to premature OOM. Fixes: be97a41b291e ("mm/mempolicy.c: merge alloc_hugepage_vma to alloc_pages_vma") Link: http://lkml.kernel.org/r/20170118141124.8345-1-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> [4.0+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-25	radix-tree: fix private list warnings	Matthew Wilcox	1	-1/+1
	The newly introduced warning in radix_tree_free_nodes() was testing the wrong variable; it should have been 'old' instead of 'node'. Fixes: ea07b862ac8e ("mm: workingset: fix use-after-free in shadow node shrinker") Link: http://lkml.kernel.org/r/20170118163746.GA32495@cmpxchg.org Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-25	Documentation/filesystems/proc.txt: add VmPin	Fabian Frederick	1	-2/+3
	Commit bc3e53f682d9 ("mm: distinguish between mlocked and pinned pages") added VmPin in /proc/<pid>/status. Report that in Documentation/filesystems/proc.txt Also move Umask after Name to keep correct order. Link: http://lkml.kernel.org/r/20170114201219.30387-1-fabf@skynet.be Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-25	mm, memcg: do not retry precharge charges	David Rientjes	1	-2/+2
	When memory.move_charge_at_immigrate is enabled and precharges are depleted during move, mem_cgroup_move_charge_pte_range() will attempt to increase the size of the precharge. Prevent precharges from ever looping by setting __GFP_NORETRY. This was probably the intention of the GFP_KERNEL & ~__GFP_NORETRY, which is pointless as written. Fixes: 0029e19ebf84 ("mm: memcontrol: remove explicit OOM parameter in charge path") Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1701130208510.69402@chino.kir.corp.google.com Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-25	proc: add a schedule point in proc_pid_readdir()	Eric Dumazet	1	-0/+2
	We have seen proc_pid_readdir() invocations holding cpu for more than 50 ms. Add a cond_resched() to be gentle with other tasks. [akpm@linux-foundation.org: coding style fix] Link: http://lkml.kernel.org/r/1484238380.15816.42.camel@edumazet-glaptop3.roam.corp.google.com Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-25	mm: alloc_contig: re-allow CMA to compact FS pages	Lucas Stach	1	-0/+1
	Commit 73e64c51afc5 ("mm, compaction: allow compaction for GFP_NOFS requests") changed compation to skip FS pages if not explicitly allowed to touch them, but missed to update the CMA compact_control. This leads to a very high isolation failure rate, crippling performance of CMA even on a lightly loaded system. Re-allow CMA to compact FS pages by setting the correct GFP flags, restoring CMA behavior and performance to the kernel 4.9 level. Fixes: 73e64c51afc5 (mm, compaction: allow compaction for GFP_NOFS requests) Link: http://lkml.kernel.org/r/20170113115155.24335-1-l.stach@pengutronix.de Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-25	mm/slub.c: trace free objects at KERN_INFO	Daniel Thompson	1	-10/+13
	Currently when trace is enabled (e.g. slub_debug=T,kmalloc-128 ) the trace messages are mostly output at KERN_INFO. However the trace code also calls print_section() to hexdump the head of a free object. This is hard coded to use KERN_ERR, meaning the console is deluged with trace messages even if we've asked for quiet. Fix this the obvious way but adding a level parameter to print_section(), allowing calls from the trace code to use the same trace level as other trace messages. Link: http://lkml.kernel.org/r/20170113154850.518-1-daniel.thompson@linaro.org Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Acked-by: Christoph Lameter <cl@linux.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-25	userfaultfd: fix SIGBUS resulting from false rwsem wakeups	Andrea Arcangeli	1	-2/+35
	With >=32 CPUs the userfaultfd selftest triggered a graceful but unexpected SIGBUS because VM_FAULT_RETRY was returned by handle_userfault() despite the UFFDIO_COPY wasn't completed. This seems caused by rwsem waking the thread blocked in handle_userfault() and we can't run up_read() before the wait_event sequence is complete. Keeping the wait_even sequence identical to the first one, would require running userfaultfd_must_wait() again to know if the loop should be repeated, and it would also require retaking the rwsem and revalidating the whole vma status. It seems simpler to wait the targeted wakeup so that if false wakeups materialize we still wait for our specific wakeup event, unless of course there are signals or the uffd was released. Debug code collecting the stack trace of the wakeup showed this: $ ./userfaultfd 100 99999 nr_pages: 25600, nr_pages_per_cpu: 800 bounces: 99998, mode: racing ver poll, userfaults: 32 35 90 232 30 138 69 82 34 30 139 40 40 31 20 19 43 13 15 28 27 38 21 43 56 22 1 17 31 8 4 2 bounces: 99997, mode: rnd ver poll, Bus error (core dumped) save_stack_trace+0x2b/0x50 try_to_wake_up+0x2a6/0x580 wake_up_q+0x32/0x70 rwsem_wake+0xe0/0x120 call_rwsem_wake+0x1b/0x30 up_write+0x3b/0x40 vm_mmap_pgoff+0x9c/0xc0 SyS_mmap_pgoff+0x1a9/0x240 SyS_mmap+0x22/0x30 entry_SYSCALL_64_fastpath+0x1f/0xbd 0xffffffffffffffff FAULT_FLAG_ALLOW_RETRY missing 70 CPU: 24 PID: 1054 Comm: userfaultfd Tainted: G W 4.8.0+ #30 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014 Call Trace: dump_stack+0xb8/0x112 handle_userfault+0x572/0x650 handle_mm_fault+0x12cb/0x1520 __do_page_fault+0x175/0x500 trace_do_page_fault+0x61/0x270 do_async_page_fault+0x19/0x90 async_page_fault+0x25/0x30 This always happens when the main userfault selftest thread is running clone() while glibc runs either mprotect or mmap (both taking mmap_sem down_write()) to allocate the thread stack of the background threads, while locking/userfault threads already run at full throttle and are susceptible to false wakeups that may cause handle_userfault() to return before than expected (which results in graceful SIGBUS at the next attempt). This was reproduced only with >=32 CPUs because the loop to start the thread where clone() is too quick with fewer CPUs, while with 32 CPUs there's already significant activity on ~32 locking and userfault threads when the last background threads are started with clone(). This >=32 CPUs SMP race condition is likely reproducible only with the selftest because of the much heavier userfault load it generates if compared to real apps. We'll have to allow "one more" VM_FAULT_RETRY for the WP support and a patch floating around that provides it also hidden this problem but in reality only is successfully at hiding the problem. False wakeups could still happen again the second time handle_userfault() is invoked, even if it's a so rare race condition that getting false wakeups twice in a row is impossible to reproduce. This full fix is needed for correctness, the only alternative would be to allow VM_FAULT_RETRY to be returned infinitely. With this fix the WP support can stick to a strict "one more" VM_FAULT_RETRY logic (no need of returning it infinite times to avoid the SIGBUS). Link: http://lkml.kernel.org/r/20170111005535.13832-2-aarcange@redhat.com Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: Shubham Kumar Sharma <shubham.kumar.sharma@oracle.com> Tested-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Michael Rapoport <RAPOPORT@il.ibm.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>