summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-09-01perf tools: Correct SNOOPX field offsetAl Grant1-1/+1
perf_event.h has macros that define the field offsets in the data_src bitmask in perf records. The SNOOPX and REMOTE offsets were both 37. These are distinct fields, and the bitfield layout in perf_mem_data_src confirms that SNOOPX should be at offset 38. Committer notes: This was extracted from a larger patch that also contained kernel changes. Fixes: 52839e653b5629bd ("perf tools: Add support for printing new mem_info encodings") Signed-off-by: Al Grant <al.grant@arm.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/9974f2d0-bf7f-518e-d9f7-4520e5ff1bb0@foss.arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-01perf intel-pt: Fix corrupt data after perf inject fromAl Grant1-1/+8
Commit 42bbabed09ce6208 ("perf tools: Add hw_idx in struct branch_stack") changed the format of branch stacks in perf samples. When samples use this new format, a flag must be set in the corresponding event. Synthesized branch stacks generated from Intel PT were using the new format, but not setting the event attribute, leading to consumers seeing corrupt data. This patch fixes the issue by setting the event attribute to indicate use of the new format. Fixes: 42bbabed09ce6208 ("perf tools: Add hw_idx in struct branch_stack") Signed-off-by: Al Grant <al.grant@arm.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lore.kernel.org/lkml/20200819084751.17686-2-leo.yan@linaro.org Signed-off-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-01perf cs-etm: Fix corrupt data after perf inject fromAl Grant1-1/+8
Commit 42bbabed09ce6208 ("perf tools: Add hw_idx in struct branch_stack") changed the format of branch stacks in perf samples. When samples use this new format, a flag must be set in the corresponding event. Synthesized branch stacks generated from CoreSight ETM trace were using the new format, but not setting the event attribute, leading to consumers seeing corrupt data. This patch fixes the issue by setting the event attribute to indicate use of the new format. Fixes: 42bbabed09ce6208 ("perf tools: Add hw_idx in struct branch_stack") Signed-off-by: Al Grant <al.grant@arm.com> Reviewed-by: Andrea Brunato <andrea.brunato@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: Leo Yan <leo.yan@linaro.org> Link: http://lore.kernel.org/lkml/20200819084751.17686-1-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-01perf top/report: Fix infinite loop in the TUI for grouped eventsArnaldo Carvalho de Melo1-1/+2
For a while we need to have a dummy event for doing things like receiving PERF_RECORD_COMM, PERF_RECORD_EXEC, etc for threads being created and dying while we synthesize the pre-existing ones at tool start. This 'dummy' event is needed for keeping track of thread lifetime events early in the session but are uninteresting otherwise, i.e. no need to have it in a initial events menu for the non-grouped case, i.e. for: # perf top -e cycles,instructions or even for plain: # perf top When 'cycles' and that 'dummy' event are in place. The code to remove that 'dummy' event ended up creating an endless loop for the grouped case, i.e.: # perf top -e '{cycles,instructions}' Fix it. Fixes: bee9ca1c8a237ca1 ("perf report TUI: Remove needless 'dummy' event from menu") Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-01perf parse-events: Avoid an uninitialized read when using fake PMUsIan Rogers1-13/+17
With a fake_pmu the pmu_info isn't populated by perf_pmu__check_alias. In this case, don't try to copy the uninitialized values to the evsel. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200826042910.1902374-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-01perf stat: Fix out of bounds array access in the print_counters() evlist methodThomas Richter1-1/+1
Fix a compile error on F32 and gcc version 10.1 on s390 in file utils/stat-display.c. The error does not show up with make DEBUG=y. In fact the issue shows up when using both compiler options -O6 and -D_FORTIFY_SOURCE=2 (which are omitted with DEBUG=Y). This is the offending call chain: print_counter_aggr() printout(config, -1, 0, ...) with 2nd parm id set to -1 aggr_printout(config, x, id --> -1, ...) which leads to this code: case AGGR_NONE: if (evsel->percore && !config->percore_show_thread) { .... } else { fprintf(config->output, "CPU%*d%s", config->csv_output ? 0 : -7, evsel__cpus(evsel)->map[id], ^^ id is -1 !!!! config->csv_sep); } This is a compiler inlining issue which is detected on s390 but not on other plattforms. Output before: # make util/stat-display.o ..... util/stat-display.c: In function ‘perf_evlist__print_counters’: util/stat-display.c:121:4: error: array subscript -1 is below array bounds of ‘int[]’ [-Werror=array-bounds] 121 | fprintf(config->output, "CPU%*d%s", | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 122 | config->csv_output ? 0 : -7, | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 123 | evsel__cpus(evsel)->map[id], | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 124 | config->csv_sep); | ~~~~~~~~~~~~~~~~ In file included from util/evsel.h:13, from util/evlist.h:13, from util/stat-display.c:9: /root/linux/tools/lib/perf/include/internal/cpumap.h:10:7: note: while referencing ‘map’ 10 | int map[]; | ^~~ cc1: all warnings being treated as errors mv: cannot stat 'util/.stat-display.o.tmp': No such file or directory make[3]: *** [/root/linux/tools/build/Makefile.build:97: util/stat-display.o] Error 1 make[2]: *** [Makefile.perf:716: util/stat-display.o] Error 2 make[1]: *** [Makefile.perf:231: sub-make] Error 2 make: *** [Makefile:110: util/stat-display.o] Error 2 [root@t35lp46 perf]# Output after: # make util/stat-display.o ..... CC util/stat-display.o [root@t35lp46 perf]# Committer notes: Removed the removal of {} enclosing the multiline else block, as pointed out by Jiri Olsa. Suggested-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Link: http://lore.kernel.org/lkml/20200825063304.77733-1-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-01perf test: Set NULL sentinel in pmu_events table in "Parse and process ↵Thomas Richter1-0/+3
metrics" test Linux 5.9 introduced perf test case "Parse and process metrics" and on s390 this test case always dumps core: [root@t35lp67 perf]# ./perf test -vvvv -F 67 67: Parse and process metrics : --- start --- metric expr inst_retired.any / cpu_clk_unhalted.thread for IPC parsing metric: inst_retired.any / cpu_clk_unhalted.thread Segmentation fault (core dumped) [root@t35lp67 perf]# I debugged this core dump and gdb shows this call chain: (gdb) where #0 0x000003ffabc3192a in __strnlen_c_1 () from /lib64/libc.so.6 #1 0x000003ffabc293de in strcasestr () from /lib64/libc.so.6 #2 0x0000000001102ba2 in match_metric(list=0x1e6ea20 "inst_retired.any", n=<optimized out>) at util/metricgroup.c:368 #3 find_metric (map=<optimized out>, map=<optimized out>, metric=0x1e6ea20 "inst_retired.any") at util/metricgroup.c:765 #4 __resolve_metric (ids=0x0, map=<optimized out>, metric_list=0x0, metric_no_group=<optimized out>, m=<optimized out>) at util/metricgroup.c:844 #5 resolve_metric (ids=0x0, map=0x0, metric_list=0x0, metric_no_group=<optimized out>) at util/metricgroup.c:881 #6 metricgroup__add_metric (metric=<optimized out>, metric_no_group=metric_no_group@entry=false, events=<optimized out>, events@entry=0x3ffd84fb878, metric_list=0x0, metric_list@entry=0x3ffd84fb868, map=0x0) at util/metricgroup.c:943 #7 0x00000000011034ae in metricgroup__add_metric_list (map=0x13f9828 <map>, metric_list=0x3ffd84fb868, events=0x3ffd84fb878, metric_no_group=<optimized out>, list=<optimized out>) at util/metricgroup.c:988 #8 parse_groups (perf_evlist=perf_evlist@entry=0x1e70260, str=str@entry=0x12f34b2 "IPC", metric_no_group=<optimized out>, metric_no_merge=<optimized out>, fake_pmu=fake_pmu@entry=0x1462f18 <perf_pmu.fake>, metric_events=0x3ffd84fba58, map=0x1) at util/metricgroup.c:1040 #9 0x0000000001103eb2 in metricgroup__parse_groups_test( evlist=evlist@entry=0x1e70260, map=map@entry=0x13f9828 <map>, str=str@entry=0x12f34b2 "IPC", metric_no_group=metric_no_group@entry=false, metric_no_merge=metric_no_merge@entry=false, metric_events=0x3ffd84fba58) at util/metricgroup.c:1082 #10 0x00000000010c84d8 in __compute_metric (ratio2=0x0, name2=0x0, ratio1=<synthetic pointer>, name1=0x12f34b2 "IPC", vals=0x3ffd84fbad8, name=0x12f34b2 "IPC") at tests/parse-metric.c:159 #11 compute_metric (ratio=<synthetic pointer>, vals=0x3ffd84fbad8, name=0x12f34b2 "IPC") at tests/parse-metric.c:189 #12 test_ipc () at tests/parse-metric.c:208 ..... ..... omitted many more lines This test case was added with commit 218ca91df477 ("perf tests: Add parse metric test for frontend metric"). When I compile with make DEBUG=y it works fine and I do not get a core dump. It turned out that the above listed function call chain worked on a struct pmu_event array which requires a trailing element with zeroes which was missing. The marco map_for_each_event() loops over that array tests for members metric_expr/metric_name/metric_group being non-NULL. Adding this element fixes the issue. Output after: [root@t35lp46 perf]# ./perf test 67 67: Parse and process metrics : Ok [root@t35lp46 perf]# Committer notes: As Ian remarks, this is not s390 specific: <quote Ian> This also shows up with address sanitizer on all architectures (perhaps change the patch title) and perhaps add a "Fixes: <commit>" tag. ================================================================= ==4718==ERROR: AddressSanitizer: global-buffer-overflow on address 0x55c93b4d59e8 at pc 0x55c93a1541e2 bp 0x7ffd24327c60 sp 0x7ffd24327c58 READ of size 8 at 0x55c93b4d59e8 thread T0 #0 0x55c93a1541e1 in find_metric tools/perf/util/metricgroup.c:764:2 #1 0x55c93a153e6c in __resolve_metric tools/perf/util/metricgroup.c:844:9 #2 0x55c93a152f18 in resolve_metric tools/perf/util/metricgroup.c:881:9 #3 0x55c93a1528db in metricgroup__add_metric tools/perf/util/metricgroup.c:943:9 #4 0x55c93a151996 in metricgroup__add_metric_list tools/perf/util/metricgroup.c:988:9 #5 0x55c93a1511b9 in parse_groups tools/perf/util/metricgroup.c:1040:8 #6 0x55c93a1513e1 in metricgroup__parse_groups_test tools/perf/util/metricgroup.c:1082:9 #7 0x55c93a0108ae in __compute_metric tools/perf/tests/parse-metric.c:159:8 #8 0x55c93a010744 in compute_metric tools/perf/tests/parse-metric.c:189:9 #9 0x55c93a00f5ee in test_ipc tools/perf/tests/parse-metric.c:208:2 #10 0x55c93a00f1e8 in test__parse_metric tools/perf/tests/parse-metric.c:345:2 #11 0x55c939fd7202 in run_test tools/perf/tests/builtin-test.c:410:9 #12 0x55c939fd6736 in test_and_print tools/perf/tests/builtin-test.c:440:9 #13 0x55c939fd58c3 in __cmd_test tools/perf/tests/builtin-test.c:661:4 #14 0x55c939fd4e02 in cmd_test tools/perf/tests/builtin-test.c:807:9 #15 0x55c939e4763d in run_builtin tools/perf/perf.c:313:11 #16 0x55c939e46475 in handle_internal_command tools/perf/perf.c:365:8 #17 0x55c939e4737e in run_argv tools/perf/perf.c:409:2 #18 0x55c939e45f7e in main tools/perf/perf.c:539:3 0x55c93b4d59e8 is located 0 bytes to the right of global variable 'pme_test' defined in 'tools/perf/tests/parse-metric.c:17:25' (0x55c93b4d54a0) of size 1352 SUMMARY: AddressSanitizer: global-buffer-overflow tools/perf/util/metricgroup.c:764:2 in find_metric Shadow bytes around the buggy address: 0x0ab9a7692ae0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0ab9a7692af0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0ab9a7692b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0ab9a7692b10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0ab9a7692b20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 =>0x0ab9a7692b30: 00 00 00 00 00 00 00 00 00 00 00 00 00[f9]f9 f9 0x0ab9a7692b40: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 0x0ab9a7692b50: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 0x0ab9a7692b60: f9 f9 f9 f9 f9 f9 f9 f9 00 00 00 00 00 00 00 00 0x0ab9a7692b70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0ab9a7692b80: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb Shadow gap: cc </quote> I'm also adding the missing "Fixes" tag and setting just .name to NULL, as doing it that way is more compact (the compiler will zero out everything else) and the table iterators look for .name being NULL as the sentinel marking the end of the table. Fixes: 0a507af9c681ac2a ("perf tests: Add parse metric test for ipc metric") Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Link: http://lore.kernel.org/lkml/20200825071211.16959-1-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-01perf parse-events: Set exclude_guest=1 for user-space countingJin Yao2-2/+5
Currently if we run 'perf record -e cycles:u', exclude_guest=0. But it doesn't make sense in most cases that we request for user-space counting but we also get the guest report. Of course, we also need to consider 'perf kvm' usage case that authorized perf users on the host may only want to count guest user space events. For example, # perf kvm --guest record -e cycles:u When we have 'exclude_guest=1' for 'perf kvm' usage, we may get nothing from guest events. To keep perf semantics consistent and clear, this patch sets exclude_guest=1 for user-space counting but except for 'perf kvm' usage. Before: perf record -e cycles:u ./div perf evlist -v cycles:u: ..., exclude_kernel: 1, exclude_hv: 1, ... After: perf record -e cycles:u ./div perf evlist -v cycles:u: ..., exclude_kernel: 1, exclude_hv: 1, exclude_guest: 1, ... Before: perf kvm --guest record -e cycles:u -vvv perf_event_attr: size 120 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|ID|CPU|PERIOD read_format ID disabled 1 inherit 1 exclude_kernel 1 exclude_hv 1 freq 1 sample_id_all 1 After: perf kvm --guest record -e cycles:u -vvv perf_event_attr: size 120 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|ID|CPU|PERIOD read_format ID disabled 1 inherit 1 exclude_kernel 1 exclude_hv 1 freq 1 sample_id_all 1 For Before/After, exclude_guest are both 0 for perf kvm usage. perf test 6 6: Parse event definition strings : Ok Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Tested-by: Like Xu <like.xu@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200814012120.16647-1-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-01perf record: Correct the help info of option "--no-bpf-event"Wei Li1-1/+1
The help info of option "--no-bpf-event" is wrongly described as "record bpf events", correct it. Committer testing: $ perf record -h bpf Usage: perf record [<options>] [<command>] or: perf record [<options>] -- <command> [<options>] --clang-opt <clang options> options passed to clang when compiling BPF scriptlets --clang-path <clang path> clang binary to use for compiling BPF scriptlets --no-bpf-event do not record bpf events $ Fixes: 71184c6ab7e6 ("perf record: Replace option --bpf-event with --no-bpf-event") Signed-off-by: Wei Li <liwei391@huawei.com> Acked-by: Song Liu <songliubraving@fb.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Li Bin <huawei.libin@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/20200819031947.12115-1-liwei391@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-01perf tools: Use %zd for size_t printf formats on 32-bitChris Wilson2-2/+2
A couple of trivial fixes for using %zd for size_t in the code supporting the ZSTD compression library. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200820212501.24421-1-chris@chris-wilson.co.uk Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-21MAINTAINERS: Add entries for CoreSight and Arm SPE toolingMathieu Poirier1-1/+7
Add entries for perf tools elements related to the support of ARM CoreSight and ARM SPE. Also lump in arm and arm64 architecture files to provide coverage. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Acked-by: John Garry <john.garry@huawei.com> Acked-by: Will Deacon <will@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: http://lore.kernel.org/lkml/20200820175510.3935932-1-mathieu.poirier@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-21perf: arm-spe: Fix check error when synthesizing eventsWei Li1-3/+3
In arm_spe_read_record(), when we are processing an events packet, 'decoder->packet.index' is the length of payload, which has been transformed in payloadlen(). So correct the check of 'idx'. Signed-off-by: Wei Li <liwei391@huawei.com> Reviewed-by: Leo Yan <leo.yan@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200724072628.35904-1-liwei391@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-21perf symbols: Add mwait_idle_with_hints.constprop.0 to the list of idle symbolsArnaldo Carvalho de Melo1-0/+1
The "mwait_idle_with_hints" one was already there, some compiler artifact now adds this ".constprop.0" suffix, cover that one too. At some point we need to put these in a special bucket and show it somewhere on the screen. Noticed building the kernel on a fedora:32 system using: gcc version 10.2.1 20200723 (Red Hat 10.2.1-1) (GCC) Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-21perf top: Skip side-band event setup if HAVE_LIBBPF_SUPPORT is not setTiezhu Yang1-0/+2
When I execute 'perf top' without HAVE_LIBBPF_SUPPORT, there exists the following segmentation fault, skip the side-band event setup to fix it, this is similar with commit 1101c872c8c7 ("perf record: Skip side-band event setup if HAVE_LIBBPF_SUPPORT is not set"). [yangtiezhu@linux perf]$ ./perf top <SNIP> perf: Segmentation fault Obtained 6 stack frames. ./perf(sighandler_dump_stack+0x5c) [0x12011b604] [0xffffffc010] ./perf(perf_mmap__read_init+0x3e) [0x1201feeae] ./perf() [0x1200d715c] /lib64/libpthread.so.0(+0xab9c) [0xffee10ab9c] /lib64/libc.so.6(+0x128f4c) [0xffedc08f4c] Segmentation fault [yangtiezhu@linux perf]$ I use git bisect to find commit b38d85ef49cf ("perf bpf: Decouple creating the evlist from adding the SB event") is the first bad commit, so also add the Fixes tag. Committer testing: First build perf explicitely disabling libbpf: $ make NO_LIBBPF=1 O=/tmp/build/perf -C tools/perf install-bin && perf test python Now make sure it isn't linked: $ perf -vv | grep -w bpf bpf: [ OFF ] # HAVE_LIBBPF_SUPPORT $ $ nm ~/bin/perf | grep libbpf $ And now try to run 'perf top': # perf top perf: Segmentation fault -------- backtrace -------- perf[0x5bcd6d] /lib64/libc.so.6(+0x3ca6f)[0x7fd0f5a66a6f] perf(perf_mmap__read_init+0x1e)[0x5e1afe] perf[0x4cc468] /lib64/libpthread.so.0(+0x9431)[0x7fd0f645a431] /lib64/libc.so.6(clone+0x42)[0x7fd0f5b2b912] # Applying this patch fixes the issue. Fixes: b38d85ef49cf ("perf bpf: Decouple creating the evlist from adding the SB event") Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xuefeng Li <lixuefeng@loongson.cn> Link: http://lore.kernel.org/lkml/1597753837-16222-1-git-send-email-yangtiezhu@loongson.cn Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-21perf sched timehist: Fix use of CPU list with summary optionDavid Ahern1-1/+5
Do not update thread stats or show idle summary unless CPU is in the list of interest. Fixes: c30d630d1bcfad8d ("perf sched timehist: Add support for filtering on CPU") Signed-off-by: David Ahern <dsahern@kernel.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Link: http://lore.kernel.org/lkml/20200817170943.1486-1-dsahern@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-21perf test: Fix basic bpf filtering testSumanth Korikkar1-1/+1
BPF basic filtering test fails on s390x (when vmlinux debuginfo is utilized instead of /proc/kallsyms) Info: - bpf_probe_load installs the bpf code at do_epoll_wait. - For s390x, do_epoll_wait resolves to 3 functions including inlines. found inline addr: 0x43769e Probe point found: __s390_sys_epoll_wait+6 found inline addr: 0x437290 Probe point found: do_epoll_wait+0 found inline addr: 0x4375d6 Probe point found: __se_sys_epoll_wait+6 - add_bpf_event creates evsel for every probe in a BPF object. This results in 3 evsels. Solution: - Expected result = 50% of the samples to be collected from epoll_wait * number of entries present in the evlist. Committer testing: # perf test 42 42: BPF filter : 42.1: Basic BPF filtering : Ok 42.2: BPF pinning : Ok 42.3: BPF prologue generation : Ok 42.4: BPF relocation checker : Ok # Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Reviewed-by: Thomas Richter <tmricht@linux.ibm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: bpf@vger.kernel.org Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Sven Schnelle <svens@linux.ibm.com> LPU-Reference: 20200817072754.58344-1-sumanthk@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-21Merge tag 'pci-v5.9-fixes-1' of ↵Linus Torvalds1-4/+6
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fix from Bjorn Helgaas: "Fix P2PDMA build issue (Christoph Hellwig)" * tag 'pci-v5.9-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI/P2PDMA: Fix build without DMA ops
2020-08-20Merge tag 'dma-mapping-5.9-1' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds5-78/+92
Pull dma-mapping fixes from Christoph Hellwig: "Fix more fallout from the dma-pool changes (Nicolas Saenz Julienne, me)" * tag 'dma-mapping-5.9-1' of git://git.infradead.org/users/hch/dma-mapping: dma-pool: Only allocate from CMA when in same memory zone dma-pool: fix coherent pool allocations for IOMMU mappings
2020-08-20afs: Fix key ref leak in afs_put_operation()David Howells1-0/+1
The afs_put_operation() function needs to put the reference to the key that's authenticating the operation. Fixes: e49c7b2f6de7 ("afs: Build an abstraction around an "operation" concept") Reported-by: Dave Botsch <botsch@cnf.cornell.edu> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-20Merge tag 'vfio-v5.9-rc2' of git://github.com/awilliam/linux-vfioLinus Torvalds3-29/+164
Pull VFIO fixes from Alex Williamson: - Fix lockdep issue reported for recursive read-lock (Alex Williamson) - Fix missing unwind in type1 replay function (Alex Williamson) * tag 'vfio-v5.9-rc2' of git://github.com/awilliam/linux-vfio: vfio/type1: Add proper error unwind for vfio_iommu_replay() vfio-pci: Avoid recursive read-lock usage
2020-08-19lib/string.c: Use freestanding environmentArvind Sankar1-1/+6
gcc can transform the loop in a naive implementation of memset/memcpy etc into a call to the function itself. This optimization is enabled by -ftree-loop-distribute-patterns. This has been the case for a while, but gcc-10.x enables this option at -O2 rather than -O3 as in previous versions. Add -ffreestanding, which implicitly disables this optimization with gcc. It is unclear whether clang performs such optimizations, but hopefully it will also not do so in a freestanding environment. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-19x86/boot/compressed: Use builtin mem functions for decompressorArvind Sankar2-9/+3
Since commits c041b5ad8640 ("x86, boot: Create a separate string.h file to provide standard string functions") fb4cac573ef6 ("x86, boot: Move memcmp() into string.h and string.c") the decompressor stub has been using the compiler's builtin memcpy, memset and memcmp functions, _except_ where it would likely have the largest impact, in the decompression code itself. Remove the #undef's of memcpy and memset in misc.c so that the decompressor code also uses the compiler builtins. The rationale given in the comment doesn't really apply: just because some functions use the out-of-line version is no reason to not use the builtin version in the rest. Replace the comment with an explanation of why memzero and memmove are being #define'd. Drop the suggestion to #undef in boot/string.h as well: the out-of-line versions are not really optimized versions, they're generic code that's good enough for the preboot environment. The compiler will likely generate better code for constant-size memcpy/memset/memcmp if it is allowed to. Most decompressors' performance is unchanged, with the exception of LZ4 and 64-bit ZSTD. Before After ARCH LZ4 73ms 10ms 32 LZ4 120ms 10ms 64 ZSTD 90ms 74ms 64 Measurements on QEMU on 2.2GHz Broadwell Xeon, using defconfig kernels. Decompressor code size has small differences, with the largest being that 64-bit ZSTD decreases just over 2k. The largest code size increase was on 64-bit XZ, of about 400 bytes. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Suggested-by: Nick Terrell <nickrterrell@gmail.com> Tested-by: Nick Terrell <nickrterrell@gmail.com> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-19Merge tag 'spi-fix-v5.9-rc1' of ↵Linus Torvalds4-39/+86
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "A bunch of fixes that came in for SPI during the merge window. Some from ST and others for their controller, one from Lukas for a race between device addition and controller unregistration and one from fix from Geert for the DT bindings which unbreaks validation" * tag 'spi-fix-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: dt-bindings: lpspi: Add missing boolean type for fsl,spi-only-use-cs1-sel spi: stm32: always perform registers configuration prior to transfer spi: stm32: fixes suspend/resume management spi: stm32: fix stm32_spi_prepare_mbr in case of odd clk_rate spi: stm32: fix fifo threshold level in case of short transfer spi: stm32h7: fix race condition at end of transfer spi: stm32: clear only asserted irq flags on interrupt spi: Prevent adding devices below an unregistering controller
2020-08-18Merge tag 'fixes-2020-08-18' of ↵Linus Torvalds2-0/+11
git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock Pull ia64 page table fix from Mike Rapoport: "Fix regression in IA-64 caused by page table allocation refactoring The refactoring and consolidation of <asm/pgalloc.h> caused regression on parisc and ia64. The fix for parisc made it into v5.9-rc1 while the fix ia64 got delayed a bit and here it is" * tag 'fixes-2020-08-18' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock: arch/ia64: Restore arch-specific pgd_offset_k implementation
2020-08-18mm/memory.c: skip spurious TLB flush for retried page faultYang Shi1-0/+3
Recently we found regression when running will_it_scale/page_fault3 test on ARM64. Over 70% down for the multi processes cases and over 20% down for the multi threads cases. It turns out the regression is caused by commit 89b15332af7c ("mm: drop mmap_sem before calling balance_dirty_pages() in write fault"). The test mmaps a memory size file then write to the mapping, this would make all memory dirty and trigger dirty pages throttle, that upstream commit would release mmap_sem then retry the page fault. The retried page fault would see correct PTEs installed then just fall through to spurious TLB flush. The regression is caused by the excessive spurious TLB flush. It is fine on x86 since x86's spurious TLB flush is no-op. We could just skip the spurious TLB flush to mitigate the regression. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Xu Yu <xuyu@linux.alibaba.com> Debugged-by: Xu Yu <xuyu@linux.alibaba.com> Tested-by: Xu Yu <xuyu@linux.alibaba.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Yang Shi <shy828301@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-18Merge tag 'pstore-v5.9-rc2' of ↵Linus Torvalds1-57/+58
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull mailmap update from Kees Cook: "This was originally part of my pstore tree, but when I realized that mailmap needed re-alphabetizing, I decided to wait until -rc1 to send this, as I saw a lot of mailmap additions pending in -next for the merge window. It's a programmatic reordering and the addition of a pstore contributor's preferred email address" * tag 'pstore-v5.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: mailmap: Add WeiXiong Liao mailmap: Restore dictionary sorting
2020-08-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds76-349/+768
Pull networking fixes from David Miller: "Another batch of fixes: 1) Remove nft_compat counter flush optimization, it generates warnings from the refcount infrastructure. From Florian Westphal. 2) Fix BPF to search for build id more robustly, from Jiri Olsa. 3) Handle bogus getopt lengths in ebtables, from Florian Westphal. 4) Infoleak and other fixes to j1939 CAN driver, from Eric Dumazet and Oleksij Rempel. 5) Reset iter properly on mptcp sendmsg() error, from Florian Westphal. 6) Show a saner speed in bonding broadcast mode, from Jarod Wilson. 7) Various kerneldoc fixes in bonding and elsewhere, from Lee Jones. 8) Fix double unregister in bonding during namespace tear down, from Cong Wang. 9) Disable RP filter during icmp_redirect selftest, from David Ahern" * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (75 commits) otx2_common: Use devm_kcalloc() in otx2_config_npa() net: qrtr: fix usage of idr in port assignment to socket selftests: disable rp_filter for icmp_redirect.sh Revert "net: xdp: pull ethernet header off packet after computing skb->protocol" phylink: <linux/phylink.h>: fix function prototype kernel-doc warning mptcp: sendmsg: reset iter on error redux net: devlink: Remove overzealous WARN_ON with snapshots tipc: not enable tipc when ipv6 works as a module tipc: fix uninit skb->data in tipc_nl_compat_dumpit() net: Fix potential wrong skb->protocol in skb_vlan_untag() net: xdp: pull ethernet header off packet after computing skb->protocol ipvlan: fix device features bonding: fix a potential double-unregister can: j1939: add rxtimer for multipacket broadcast session can: j1939: abort multipacket broadcast session when timeout occurs can: j1939: cancel rxtimer on multipacket broadcast session complete can: j1939: fix support for multipacket broadcast message net: fddi: skfp: cfm: Remove seemingly unused variable 'ID_sccs' net: fddi: skfp: cfm: Remove set but unused variable 'oldstate' net: fddi: skfp: smt: Remove seemingly unused variable 'ID_sccs' ...
2020-08-18otx2_common: Use devm_kcalloc() in otx2_config_npa()Xu Wang1-2/+2
A multiplication for the size determination of a memory allocation indicated that an array data structure should be processed. Thus use the corresponding function "devm_kcalloc". Signed-off-by: Xu Wang <vulab@iscas.ac.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-18PCI/P2PDMA: Fix build without DMA opsChristoph Hellwig1-4/+6
My commit to make DMA ops support optional missed the reference in the p2pdma code. And while the build bot didn't manage to find a config where this can happen, Matthew did. Fix this by replacing two IS_ENABLED checks with ifdefs. Fixes: 2f9237d4f6df ("dma-mapping: make support for dma ops optional") Link: https://lore.kernel.org/r/20200810124843.1532738-1-hch@lst.de Reported-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
2020-08-18net: qrtr: fix usage of idr in port assignment to socketNecip Fazil Yildiran1-9/+11
Passing large uint32 sockaddr_qrtr.port numbers for port allocation triggers a warning within idr_alloc() since the port number is cast to int, and thus interpreted as a negative number. This leads to the rejection of such valid port numbers in qrtr_port_assign() as idr_alloc() fails. To avoid the problem, switch to idr_alloc_u32() instead. Fixes: bdabad3e363d ("net: Add Qualcomm IPC router") Reported-by: syzbot+f31428628ef672716ea8@syzkaller.appspotmail.com Signed-off-by: Necip Fazil Yildiran <necip@google.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-18selftests: disable rp_filter for icmp_redirect.shDavid Ahern1-0/+2
h1 is initially configured to reach h2 via r1 rather than the more direct path through r2. If rp_filter is set and inherited for r2, forwarding fails since the source address of h1 is reachable from eth0 vs the packet coming to it via r1 and eth1. Since rp_filter setting affects the test, explicitly reset it. Signed-off-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-18mailmap: Add WeiXiong LiaoKees Cook1-0/+1
WeiXiong Liao noted to me offlist that his preference for email address had changed and that he'd like it updated in the mailmap so people discussing pstore/blk would be able to reach him. Cc: WeiXiong Liao <gmpy.liaowx@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2020-08-18mailmap: Restore dictionary sortingKees Cook1-57/+57
Several names had been recently appended (instead of inserted). While git-shortlog doesn't need this file to be sorted, it helps humans to keep it organized this way. Sort the entire file (which includes some minor shuffling for dictionary order). Done with the following commands: grep -E '^(#|$)' .mailmap > .mailmap.head grep -Ev '^(#|$)' .mailmap > .mailmap.body sort -f .mailmap.body > .mailmap.body.sort cat .mailmap.head .mailmap.body.sort > .mailmap rm .mailmap.head .mailmap.body.sort Signed-off-by: Kees Cook <keescook@chromium.org>
2020-08-17arch/ia64: Restore arch-specific pgd_offset_k implementationJessica Clarke2-0/+11
IA-64 is special and treats pgd_offset_k() differently to pgd_offset(), using different formulae to calculate the indices into the kernel and user PGDs. The index into the user PGDs takes into account the region number, but the index into the kernel (init_mm) PGD always assumes a predefined kernel region number. Commit 974b9b2c68f3 ("mm: consolidate pte_index() and pte_offset_*() definitions") made IA-64 use a generic pgd_offset_k() which incorrectly used pgd_index() for kernel page tables. As a result, the index into the kernel PGD was going out of bounds and the kernel hung during early boot. Allow overrides of pgd_offset_k() and override it on IA-64 with the old implementation that will correctly index the kernel PGD. Fixes: 974b9b2c68f3 ("mm: consolidate pte_index() and pte_offset_*() definitions") Reported-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Signed-off-by: Jessica Clarke <jrtc27@jrtc27.com> Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Acked-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
2020-08-17Revert "net: xdp: pull ethernet header off packet after computing skb->protocol"David S. Miller1-1/+0
This reverts commit f8414a8d886b613b90d9fdf7cda6feea313b1069. eth_type_trans() does the necessary pull on the skb. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-17phylink: <linux/phylink.h>: fix function prototype kernel-doc warningRandy Dunlap1-1/+2
Fix a kernel-doc warning for the pcs_config() function prototype: ../include/linux/phylink.h:406: warning: Excess function parameter 'permit_pause_to_mac' description in 'pcs_config' Fixes: 7137e18f6f88 ("net: phylink: add struct phylink_pcs") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Russell King <linux@armlinux.org.uk> Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-17vfio/type1: Add proper error unwind for vfio_iommu_replay()Alex Williamson1-5/+66
The vfio_iommu_replay() function does not currently unwind on error, yet it does pin pages, perform IOMMU mapping, and modify the vfio_dma structure to indicate IOMMU mapping. The IOMMU mappings are torn down when the domain is destroyed, but the other actions go on to cause trouble later. For example, the iommu->domain_list can be empty if we only have a non-IOMMU backed mdev attached. We don't currently check if the list is empty before getting the first entry in the list, which leads to a bogus domain pointer. If a vfio_dma entry is erroneously marked as iommu_mapped, we'll attempt to use that bogus pointer to retrieve the existing physical page addresses. This is the scenario that uncovered this issue, attempting to hot-add a vfio-pci device to a container with an existing mdev device and DMA mappings, one of which could not be pinned, causing a failure adding the new group to the existing container and setting the conditions for a subsequent attempt to explode. To resolve this, we can first check if the domain_list is empty so that we can reject replay of a bogus domain, should we ever encounter this inconsistent state again in the future. The real fix though is to add the necessary unwind support, which means cleaning up the current pinning if an IOMMU mapping fails, then walking back through the r-b tree of DMA entries, reading from the IOMMU which ranges are mapped, and unmapping and unpinning those ranges. To be able to do this, we also defer marking the DMA entry as IOMMU mapped until all entries are processed, in order to allow the unwind to know the disposition of each entry. Fixes: a54eb55045ae ("vfio iommu type1: Add support for mediated devices") Reported-by: Zhiyi Guo <zhguo@redhat.com> Tested-by: Zhiyi Guo <zhguo@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-08-17vfio-pci: Avoid recursive read-lock usageAlex Williamson2-24/+98
A down_read on memory_lock is held when performing read/write accesses to MMIO BAR space, including across the copy_to/from_user() callouts which may fault. If the user buffer for these copies resides in an mmap of device MMIO space, the mmap fault handler will acquire a recursive read-lock on memory_lock. Avoid this by reducing the lock granularity. Sequential accesses requiring multiple ioread/iowrite cycles are expected to be rare, therefore typical accesses should not see additional overhead. VGA MMIO accesses are expected to be non-fatal regardless of the PCI memory enable bit to allow legacy probing, this behavior remains with a comment added. ioeventfds are now included in memory access testing, with writes dropped while memory space is disabled. Fixes: abafbc551fdd ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory") Reported-by: Zhiyi Guo <zhguo@redhat.com> Tested-by: Zhiyi Guo <zhguo@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-08-17watch_queue: Limit the number of watches a user can holdDavid Howells2-0/+11
Impose a limit on the number of watches that a user can hold so that they can't use this mechanism to fill up all the available memory. This is done by putting a counter in user_struct that's incremented when a watch is allocated and decreased when it is released. If the number exceeds the RLIMIT_NOFILE limit, the watch is rejected with EAGAIN. This can be tested by the following means: (1) Create a watch queue and attach it to fd 5 in the program given - in this case, bash: keyctl watch_session /tmp/nlog /tmp/gclog 5 bash (2) In the shell, set the maximum number of files to, say, 99: ulimit -n 99 (3) Add 200 keyrings: for ((i=0; i<200; i++)); do keyctl newring a$i @s || break; done (4) Try to watch all of the keyrings: for ((i=0; i<200; i++)); do echo $i; keyctl watch_add 5 %:a$i || break; done This should fail when the number of watches belonging to the user hits 99. (5) Remove all the keyrings and all of those watches should go away: for ((i=0; i<200; i++)); do keyctl unlink %:a$i; done (6) Kill off the watch queue by exiting the shell spawned by watch_session. Fixes: c73be61cede5 ("pipe: Add general notification queue support") Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-17mptcp: sendmsg: reset iter on error reduxFlorian Westphal1-3/+5
This fix wasn't correct: When this function is invoked from the retransmission worker, the iterator contains garbage and resetting it causes a crash. As the work queue should not be performance critical also zero the msghdr struct. Fixes: 35759383133f64d "(mptcp: sendmsg: reset iter on error)" Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-17net: devlink: Remove overzealous WARN_ON with snapshotsAndrew Lunn1-1/+1
It is possible to trigger this WARN_ON from user space by triggering a devlink snapshot with an ID which already exists. We don't need both -EEXISTS being reported and spamming the kernel log. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Chris Healy <cphealy@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-17tipc: not enable tipc when ipv6 works as a moduleXin Long1-0/+1
When using ipv6_dev_find() in one module, it requires ipv6 not to work as a module. Otherwise, this error occurs in build: undefined reference to `ipv6_dev_find'. So fix it by adding "depends on IPV6 || IPV6=n" to tipc/Kconfig, as it does in sctp/Kconfig. Fixes: 5a6f6f579178 ("tipc: set ub->ifindex for local ipv6 address") Reported-by: kernel test robot <lkp@intel.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-17tipc: fix uninit skb->data in tipc_nl_compat_dumpit()Cong Wang1-1/+11
__tipc_nl_compat_dumpit() has two callers, and it expects them to pass a valid nlmsghdr via arg->data. This header is artificial and crafted just for __tipc_nl_compat_dumpit(). tipc_nl_compat_publ_dump() does so by putting a genlmsghdr as well as some nested attribute, TIPC_NLA_SOCK. But the other caller tipc_nl_compat_dumpit() does not, this leaves arg->data uninitialized on this call path. Fix this by just adding a similar nlmsghdr without any payload in tipc_nl_compat_dumpit(). This bug exists since day 1, but the recent commit 6ea67769ff33 ("net: tipc: prepare attrs in __tipc_nl_compat_dumpit()") makes it easier to appear. Reported-and-tested-by: syzbot+0e7181deafa7e0b79923@syzkaller.appspotmail.com Fixes: d0796d1ef63d ("tipc: convert legacy nl bearer dump to nl compat") Cc: Jon Maloy <jmaloy@redhat.com> Cc: Ying Xue <ying.xue@windriver.com> Cc: Richard Alpe <richard.alpe@ericsson.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller8-80/+73
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Endianness issue in IPv4 option support in nft_exthdr, from Stephen Suryaputra. 2) Removes the waitcount optimization in nft_compat, from Florian Westphal. 3) Remove ipv6 -> nf_defrag_ipv6 module dependency, from Florian Westphal. 4) Memleak in chain binding support, also from Florian. 5) Simplify nft_flowtable.sh selftest, from Fabian Frederick. 6) Optional MTU arguments for selftest nft_flowtable.sh, also from Fabian. 7) Remove noise error report when killing process in selftest nft_flowtable.sh, from Fabian Frederick. 8) Reject bogus getsockopt option length in ebtables, from Florian Westphal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-17Merge tag 'linux-can-fixes-for-5.9-20200815' of ↵David S. Miller1-12/+36
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2020-08-15 this is a pull request of 4 patches for net/master. All patches are by Zhang Changzhong and fix broadcast related problems in the j1939 CAN networking stack. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-17net: Fix potential wrong skb->protocol in skb_vlan_untag()Miaohe Lin1-2/+2
We may access the two bytes after vlan_hdr in vlan_set_encap_proto(). So we should pull VLAN_HLEN + sizeof(unsigned short) in skb_vlan_untag() or we may access the wrong data. Fixes: 0d5501c1c828 ("net: Always untag vlan-tagged traffic on input.") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-17net: xdp: pull ethernet header off packet after computing skb->protocolJason A. Donenfeld1-0/+1
When an XDP program changes the ethernet header protocol field, eth_type_trans is used to recalculate skb->protocol. In order for eth_type_trans to work correctly, the ethernet header must actually be part of the skb data segment, so the code first pushes that onto the head of the skb. However, it subsequently forgets to pull it back off, making the behavior of the passed-on packet inconsistent between the protocol modifying case and the static protocol case. This patch fixes the issue by simply pulling the ethernet header back off of the skb head. Fixes: 297249569932 ("net: fix generic XDP to handle if eth header was mangled") Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-17ipvlan: fix device featuresMahesh Bandewar1-5/+22
Processing NETDEV_FEAT_CHANGE causes IPvlan links to lose NETIF_F_LLTX feature because of the incorrect handling of features in ipvlan_fix_features(). --before-- lpaa10:~# ethtool -k ipvl0 | grep tx-lockless tx-lockless: on [fixed] lpaa10:~# ethtool -K ipvl0 tso off Cannot change tcp-segmentation-offload Actual changes: vlan-challenged: off [fixed] tx-lockless: off [fixed] lpaa10:~# ethtool -k ipvl0 | grep tx-lockless tx-lockless: off [fixed] lpaa10:~# --after-- lpaa10:~# ethtool -k ipvl0 | grep tx-lockless tx-lockless: on [fixed] lpaa10:~# ethtool -K ipvl0 tso off Cannot change tcp-segmentation-offload Could not change any device features lpaa10:~# ethtool -k ipvl0 | grep tx-lockless tx-lockless: on [fixed] lpaa10:~# Fixes: 2ad7bf363841 ("ipvlan: Initial check-in of the IPVLAN driver.") Signed-off-by: Mahesh Bandewar <maheshb@google.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-17bonding: fix a potential double-unregisterCong Wang1-1/+2
When we tear down a network namespace, we unregister all the netdevices within it. So we may queue a slave device and a bonding device together in the same unregister queue. If the only slave device is non-ethernet, it would automatically unregister the bonding device as well. Thus, we may end up unregistering the bonding device twice. Workaround this special case by checking reg_state. Fixes: 9b5e383c11b0 ("net: Introduce unregister_netdevice_many()") Reported-by: syzbot+af23e7f3e0a7e10c8b67@syzkaller.appspotmail.com Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-16Linux 5.9-rc1v5.9-rc1Linus Torvalds1-2/+2