kernel/linux.git - Linux kernel stable tree (mirror)

Age	Commit message (Collapse)	Author	Files	Lines
2022-08-25	selftests/kprobe: Do not test for GRP/ without event failures	Steven Rostedt (Google)	1	-1/+0
	[ Upstream commit f5eab65ff2b76449286d18efc7fee3e0b72f7d9b ] A new feature is added where kprobes (and other probes) do not need to explicitly state the event name when creating a probe. The event name will come from what is being attached. That is: # echo 'p:foo/ vfs_read' > kprobe_events Will no longer error, but instead create an event: # cat kprobe_events p:foo/p_vfs_read_0 vfs_read This should not be tested as an error case anymore. Remove it from the selftest as now this feature "breaks" the selftest as it no longer fails as expected. Link: https://lore.kernel.org/all/1656296348-16111-1-git-send-email-quic_linyyuan@quicinc.com/ Link: https://lkml.kernel.org/r/20220712161707.6dc08a14@gandalf.local.home Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-25	tools build: Switch to new openssl API for test-libcrypto	Roberto Sassu	1	-4/+11
	commit 5b245985a6de5ac18b5088c37068816d413fb8ed upstream. Switch to new EVP API for detecting libcrypto, as Fedora 36 returns an error when it encounters the deprecated function MD5_Init() and the others. The error would be interpreted as missing libcrypto, while in reality it is not. Fixes: 6e8ccb4f624a73c5 ("tools/bpf: properly account for libbfd variations") Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: bpf@vger.kernel.org Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: llvm@lists.linux.dev Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Monnet <quentin@isovalent.com> Cc: Song Liu <song@kernel.org> Cc: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20220719170555.2576993-4-roberto.sassu@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-25	tools/vm/slabinfo: use alphabetic order when two values are equal	Yuanzheng Song	1	-10/+22
	commit 4f5ceb8851f0081af54313abbf56de1615911faf upstream. When the number of partial slabs in each cache is the same (e.g., the value are 0), the results of the `slabinfo -X -N5` and `slabinfo -P -N5` are different. / # slabinfo -X -N5 ... Slabs sorted by number of partial slabs --------------------------------------- Name Objects Objsize Space Slabs/Part/Cpu O/S O %Fr %Ef Flg inode_cache 15180 392 6217728 758/0/1 20 1 0 95 a kernfs_node_cache 22494 88 2002944 488/0/1 46 0 0 98 shmem_inode_cache 663 464 319488 38/0/1 17 1 0 96 biovec-max 50 3072 163840 4/0/1 10 3 0 93 A dentry 19050 136 2600960 633/0/2 30 0 0 99 a / # slabinfo -P -N5 Name Objects Objsize Space Slabs/Part/Cpu O/S O %Fr %Ef Flg bdev_cache 32 984 32.7K 1/0/1 16 2 0 96 Aa ext4_inode_cache 42 752 32.7K 1/0/1 21 2 0 96 a dentry 19050 136 2.6M 633/0/2 30 0 0 99 a TCPv6 17 1840 32.7K 0/0/1 17 3 0 95 A RAWv6 18 856 16.3K 0/0/1 18 2 0 94 A This problem is caused by the sort_slabs(). So let's use alphabetic order when two values are equal in the sort_slabs(). By the way, the content of the `slabinfo -h` is not aligned because the `-P\|--partial Sort by number of partial slabs` uses tabs instead of spaces. So let's use spaces instead of tabs to fix it. Link: https://lkml.kernel.org/r/20220528063117.935158-1-songyuanzheng@huawei.com Fixes: 1106b205a3fe ("tools/vm/slabinfo: add partial slab listing to -X") Signed-off-by: Yuanzheng Song <songyuanzheng@huawei.com> Cc: "Tobin C. Harding" <tobin@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-25	tools/thermal: Fix possible path truncations	Florian Fainelli	1	-11/+13
	[ Upstream commit 6c58cf40e3a1d2f47c09d3489857e9476316788a ] A build with -D_FORTIFY_SOURCE=2 enabled will produce the following warnings: sysfs.c:63:30: warning: '%s' directive output may be truncated writing up to 255 bytes into a region of size between 0 and 255 [-Wformat-truncation=] snprintf(filepath, 256, "%s/%s", path, filename); ^~ Bump up the buffer to PATH_MAX which is the limit and account for all of the possible NUL and separators that could lead to exceeding the allocated buffer sizes. Fixes: 94f69966faf8 ("tools/thermal: Introduce tmon, a tool for thermal subsystem") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-25	genelf: Use HAVE_LIBCRYPTO_SUPPORT, not the never defined HAVE_LIBCRYPTO	Arnaldo Carvalho de Melo	1	-1/+5
	[ Upstream commit 91cea6be90e436c55cde8770a15e4dac9d3032d0 ] When genelf was introduced it tested for HAVE_LIBCRYPTO not HAVE_LIBCRYPTO_SUPPORT, which is the define the feature test for openssl defines, fix it. This also adds disables the deprecation warning, someone has to fix this to build with openssl 3.0 before the warning becomes a hard error. Fixes: 9b07e27f88b9cd78 ("perf inject: Add jitdump mmap injection support") Reported-by: 谭梓煊 <tanzixuan.me@gmail.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/YulpPqXSOG0Q4J1o@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-25	perf symbol: Fail to read phdr workaround	Ian Rogers	1	-7/+20
	[ Upstream commit 6d518ac7be6223811ab947897273b1bbef846180 ] The perf jvmti agent doesn't create program headers, in this case fallback on section headers as happened previously. Committer notes: To test this, from a public post by Ian: 1) download a Java workload dacapo-9.12-MR1-bach.jar from https://sourceforge.net/projects/dacapobench/ 2) build perf such as "make -C tools/perf O=/tmp/perf NO_LIBBFD=1" it should detect Java and create /tmp/perf/libperf-jvmti.so 3) run perf with the jvmti agent: perf record -k 1 java -agentpath:/tmp/perf/libperf-jvmti.so -jar dacapo-9.12-MR1-bach.jar -n 10 fop 4) run perf inject: perf inject -i perf.data -o perf-injected.data -j 5) run perf report perf report -i perf-injected.data \| grep org.apache.fop With this patch reverted I see lots of symbols like: 0.00% java jitted-388040-4656.so [.] org.apache.fop.fo.FObj.bind(org.apache.fop.fo.PropertyList) With the patch (2d86612aacb7805f ("perf symbol: Correct address for bss symbols")) I see lots of: dso__load_sym_internal: failed to find program header for symbol: Lorg/apache/fop/fo/FObj;bind(Lorg/apache/fop/fo/PropertyList;)V st_value: 0x40 Fixes: 2d86612aacb7805f ("perf symbol: Correct address for bss symbols") Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Leo Yan <leo.yan@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20220731164923.691193-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-25	selftests/bpf: fix a test for snprintf() overflow	Dan Carpenter	1	-1/+1
	[ Upstream commit c5d22f4cfe8dfb93f1db0a1e7e2e7ebc41395d98 ] The snprintf() function returns the number of bytes which would have been copied if there were space. In other words, it can be > sizeof(pin_path). Fixes: c0fa1b6c3efc ("bpf: btf: Add BTF tests") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/r/YtZ+aD/tZMkgOUw+@kili Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-25	selftests: timers: clocksource-switch: fix passing errors from child	Wolfram Sang	1	-3/+3
	[ Upstream commit 4d8f52ac5fa9eede7b7aa2f2d67c841d9eeb655f ] The return value from system() is a waitpid-style integer. Do not return it directly because with the implicit masking in exit() it will always return 0. Access it with appropriate macros to really pass on errors. Fixes: 7290ce1423c3 ("selftests/timers: Add clocksource-switch test from timetest suite") Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Acked-by: John Stultz <jstultz@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-25	selftests: timers: valid-adjtimex: build fix for newer toolchains	Wolfram Sang	1	-1/+1
	[ Upstream commit 9a162977d20436be5678a8e21a8e58eb4616d86a ] Toolchains with an include file 'sys/timex.h' based on 3.18 will have a 'clock_adjtime' definition added, so it can't be static in the code: valid-adjtimex.c:43:12: error: static declaration of ‘clock_adjtime’ follows non-static declaration Fixes: e03a58c320e1 ("kselftests: timers: Add adjtimex SETOFFSET validity tests") Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Acked-by: John Stultz <jstultz@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-25	libbpf: Fix the name of a reused map	Anquan Wu	1	-2/+7
	[ Upstream commit bf3f00378524adae16628cbadbd11ba7211863bb ] BPF map name is limited to BPF_OBJ_NAME_LEN. A map name is defined as being longer than BPF_OBJ_NAME_LEN, it will be truncated to BPF_OBJ_NAME_LEN when a userspace program calls libbpf to create the map. A pinned map also generates a path in the /sys. If the previous program wanted to reuse the map， it can not get bpf_map by name, because the name of the map is only partially the same as the name which get from pinned path. The syscall information below show that map name "process_pinned_map" is truncated to "process_pinned_". bpf(BPF_OBJ_GET, {pathname="/sys/fs/bpf/process_pinned_map", bpf_fd=0, file_flags=0}, 144) = -1 ENOENT (No such file or directory) bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_HASH, key_size=4, value_size=4,max_entries=1024, map_flags=0, inner_map_fd=0, map_name="process_pinned_",map_ifindex=0, btf_fd=3, btf_key_type_id=6, btf_value_type_id=10,btf_vmlinux_value_type_id=0}, 72) = 4 This patch check that if the name of pinned map are the same as the actual name for the first (BPF_OBJ_NAME_LEN - 1), bpf map still uses the name which is included in bpf object. Fixes: 26736eb9a483 ("tools: libbpf: allow map reuse") Signed-off-by: Anquan Wu <leiqi96@hotmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/OSZP286MB1725CEA1C95C5CB8E7CCC53FB8869@OSZP286MB1725.JPNP286.PROD.OUTLOOK.COM Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-25	thermal/tools/tmon: Include pthread and time headers in tmon.h	Markus Mayer	1	-0/+3
	[ Upstream commit 0cf51bfe999524377fbb71becb583b4ca6d07cfc ] Include sys/time.h and pthread.h in tmon.h, so that types "pthread_mutex_t" and "struct timeval tv" are known when tmon.h references them. Without these headers, compiling tmon against musl-libc will fail with these errors: In file included from sysfs.c:31:0: tmon.h:47:8: error: unknown type name 'pthread_mutex_t' extern pthread_mutex_t input_lock; ^~~~~~~~~~~~~~~ make[3]: * [<builtin>: sysfs.o] Error 1 make[3]: * Waiting for unfinished jobs.... In file included from tui.c:31:0: tmon.h:54:17: error: field 'tv' has incomplete type struct timeval tv; ^~ make[3]: * [<builtin>: tui.o] Error 1 make[2]: * [Makefile:83: tmon] Error 2 Signed-off-by: Markus Mayer <mmayer@broadcom.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com> Acked-by: Alejandro González <alejandro.gonzalez.correo@gmail.com> Tested-by: Alejandro González <alejandro.gonzalez.correo@gmail.com> Fixes: 94f69966faf8 ("tools/thermal: Introduce tmon, a tool for thermal subsystem") Link: https://lore.kernel.org/r/20220718031040.44714-1-f.fainelli@gmail.com Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-11	x86/speculation: Add RSB VM Exit protections	Daniel Sneddon	1	-0/+1
	commit 2b1299322016731d56807aa49254a5ea3080b6b3 upstream. tl;dr: The Enhanced IBRS mitigation for Spectre v2 does not work as documented for RET instructions after VM exits. Mitigate it with a new one-entry RSB stuffing mechanism and a new LFENCE. == Background == Indirect Branch Restricted Speculation (IBRS) was designed to help mitigate Branch Target Injection and Speculative Store Bypass, i.e. Spectre, attacks. IBRS prevents software run in less privileged modes from affecting branch prediction in more privileged modes. IBRS requires the MSR to be written on every privilege level change. To overcome some of the performance issues of IBRS, Enhanced IBRS was introduced. eIBRS is an "always on" IBRS, in other words, just turn it on once instead of writing the MSR on every privilege level change. When eIBRS is enabled, more privileged modes should be protected from less privileged modes, including protecting VMMs from guests. == Problem == Here's a simplification of how guests are run on Linux' KVM: void run_kvm_guest(void) { // Prepare to run guest VMRESUME(); // Clean up after guest runs } The execution flow for that would look something like this to the processor: 1. Host-side: call run_kvm_guest() 2. Host-side: VMRESUME 3. Guest runs, does "CALL guest_function" 4. VM exit, host runs again 5. Host might make some "cleanup" function calls 6. Host-side: RET from run_kvm_guest() Now, when back on the host, there are a couple of possible scenarios of post-guest activity the host needs to do before executing host code: * on pre-eIBRS hardware (legacy IBRS, or nothing at all), the RSB is not touched and Linux has to do a 32-entry stuffing. * on eIBRS hardware, VM exit with IBRS enabled, or restoring the host IBRS=1 shortly after VM exit, has a documented side effect of flushing the RSB except in this PBRSB situation where the software needs to stuff the last RSB entry "by hand". IOW, with eIBRS supported, host RET instructions should no longer be influenced by guest behavior after the host retires a single CALL instruction. However, if the RET instructions are "unbalanced" with CALLs after a VM exit as is the RET in #6, it might speculatively use the address for the instruction after the CALL in #3 as an RSB prediction. This is a problem since the (untrusted) guest controls this address. Balanced CALL/RET instruction pairs such as in step #5 are not affected. == Solution == The PBRSB issue affects a wide variety of Intel processors which support eIBRS. But not all of them need mitigation. Today, X86_FEATURE_RETPOLINE triggers an RSB filling sequence that mitigates PBRSB. Systems setting RETPOLINE need no further mitigation - i.e., eIBRS systems which enable retpoline explicitly. However, such systems (X86_FEATURE_IBRS_ENHANCED) do not set RETPOLINE and most of them need a new mitigation. Therefore, introduce a new feature flag X86_FEATURE_RSB_VMEXIT_LITE which triggers a lighter-weight PBRSB mitigation versus RSB Filling at vmexit. The lighter-weight mitigation performs a CALL instruction which is immediately followed by a speculative execution barrier (INT3). This steers speculative execution to the barrier -- just like a retpoline -- which ensures that speculation can never reach an unbalanced RET. Then, ensure this CALL is retired before continuing execution with an LFENCE. In other words, the window of exposure is opened at VM exit where RET behavior is troublesome. While the window is open, force RSB predictions sampling for RET targets to a dead end at the INT3. Close the window with the LFENCE. There is a subset of eIBRS systems which are not vulnerable to PBRSB. Add these systems to the cpu_vuln_whitelist[] as NO_EIBRS_PBRSB. Future systems that aren't vulnerable will set ARCH_CAP_PBRSB_NO. [ bp: Massage, incorporate review comments from Andy Cooper. ] [ Pawan: Update commit message to replace RSB_VMEXIT with RETPOLINE ] Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Co-developed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-11	selftests: KVM: Handle compiler optimizations in ucall	Raghavendra Rao Ananta	1	-5/+4
	[ Upstream commit 9e2f6498efbbc880d7caa7935839e682b64fe5a6 ] The selftests, when built with newer versions of clang, is found to have over optimized guests' ucall() function, and eliminating the stores for uc.cmd (perhaps due to no immediate readers). This resulted in the userspace side always reading a value of '0', and causing multiple test failures. As a result, prevent the compiler from optimizing the stores in ucall() with WRITE_ONCE(). Suggested-by: Ricardo Koller <ricarkol@google.com> Suggested-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Message-Id: <20220615185706.1099208-1-rananta@google.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-11	selftests/bpf: Fix "dubious pointer arithmetic" test	Jean-Philippe Brucker	1	-4/+4
	commit 3615bdf6d9b19db12b1589861609b4f1c6a8d303 upstream. The verifier trace changed following a bugfix. After checking the 64-bit sign, only the upper bit mask is known, not bit 31. Update the test accordingly. Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-11	selftests/bpf: Fix test_align verifier log patterns	Stanislav Fomichev	1	-20/+21
	commit 5366d2269139ba8eb6a906d73a0819947e3e4e0a upstream. Commit 294f2fc6da27 ("bpf: Verifer, adjust_scalar_min_max_vals to always call update_reg_bounds()") changed the way verifier logs some of its state, adjust the test_align accordingly. Where possible, I tried to not copy-paste the entire log line and resorted to dropping the last closing brace instead. Fixes: 294f2fc6da27 ("bpf: Verifer, adjust_scalar_min_max_vals to always call update_reg_bounds()") Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200515194904.229296-1-sdf@google.com Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-11	bpf: Test_verifier, #70 error message updates for 32-bit right shift	John Fastabend	1	-4/+2
	commit aa131ed44ae1d76637f0dbec33cfcf9115af9bc3 upstream. After changes to add update_reg_bounds after ALU ops and adding ALU32 bounds tracking the error message is changed in the 32-bit right shift tests. Test "#70/u bounds check after 32-bit right shift with 64-bit input FAIL" now fails with, Unexpected error message! EXP: R0 invalid mem access RES: func#0 @0 7: (b7) r1 = 2 8: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=invP2 R10=fp0 fp-8_w=mmmmmmmm 8: (67) r1 <<= 31 9: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=invP4294967296 R10=fp0 fp-8_w=mmmmmmmm 9: (74) w1 >>= 31 10: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=invP0 R10=fp0 fp-8_w=mmmmmmmm 10: (14) w1 -= 2 11: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=invP4294967294 R10=fp0 fp-8_w=mmmmmmmm 11: (0f) r0 += r1 math between map_value pointer and 4294967294 is not allowed And test "#70/p bounds check after 32-bit right shift with 64-bit input FAIL" now fails with, Unexpected error message! EXP: R0 invalid mem access RES: func#0 @0 7: (b7) r1 = 2 8: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=inv2 R10=fp0 fp-8_w=mmmmmmmm 8: (67) r1 <<= 31 9: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=inv4294967296 R10=fp0 fp-8_w=mmmmmmmm 9: (74) w1 >>= 31 10: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=inv0 R10=fp0 fp-8_w=mmmmmmmm 10: (14) w1 -= 2 11: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=inv4294967294 R10=fp0 fp-8_w=mmmmmmmm 11: (0f) r0 += r1 last_idx 11 first_idx 0 regs=2 stack=0 before 10: (14) w1 -= 2 regs=2 stack=0 before 9: (74) w1 >>= 31 regs=2 stack=0 before 8: (67) r1 <<= 31 regs=2 stack=0 before 7: (b7) r1 = 2 math between map_value pointer and 4294967294 is not allowed Before this series we did not trip the "math between map_value pointer..." error because check_reg_sane_offset is never called in adjust_ptr_min_max_vals(). Instead we have a register state that looks like this at line 11, 11: R0_w=map_value(id=0,off=0,ks=8,vs=8, smin_value=0,smax_value=0, umin_value=0,umax_value=0, var_off=(0x0; 0x0)) R1_w=invP(id=0, smin_value=0,smax_value=4294967295, umin_value=0,umax_value=4294967295, var_off=(0xfffffffe; 0x0)) R10=fp(id=0,off=0, smin_value=0,smax_value=0, umin_value=0,umax_value=0, var_off=(0x0; 0x0)) fp-8_w=mmmmmmmm 11: (0f) r0 += r1 In R1 'smin_val != smax_val' yet we have a tnum_const as seen by 'var_off(0xfffffffe; 0x0))' with a 0x0 mask. So we hit this check in adjust_ptr_min_max_vals() if ((known && (smin_val != smax_val \|\| umin_val != umax_val)) \|\| smin_val > smax_val \|\| umin_val > umax_val) { / Taint dst register if offset had invalid bounds derived from * e.g. dead branches. / __mark_reg_unknown(env, dst_reg); return 0; } So we don't throw an error here and instead only throw an error later in the verification when the memory access is made. The root cause in verifier without alu32 bounds tracking is having 'umin_value = 0' and 'umax_value = U64_MAX' from BPF_SUB which we set when 'umin_value < umax_val' here, if (dst_reg->umin_value < umax_val) { / Overflow possible, we know nothing / dst_reg->umin_value = 0; dst_reg->umax_value = U64_MAX; } else { ...} Later in adjust_calar_min_max_vals we previously did a coerce_reg_to_size() which will clamp the U64_MAX to U32_MAX by truncating to 32bits. But either way without a call to update_reg_bounds the less precise bounds tracking will fall out of the alu op verification. After latest changes we now exit adjust_scalar_min_max_vals with the more precise umin value, due to zero extension propogating bounds from alu32 bounds into alu64 bounds and then calling update_reg_bounds. This then causes the verifier to trigger an earlier error and we get the error in the output above. This patch updates tests to reflect new error message. I have a local patch to print entire verifier state regardless if we believe it is a constant so we can get a full picture of the state. Usually if tnum_is_const() then bounds are also smin=smax, etc. but this is not always true and is a bit subtle. Being able to see these states helps understand dataflow imo. Let me know if we want something similar upstream. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/158507161475.15666.3061518385241144063.stgit@john-Precision-5820-Tower Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-11	selftests/bpf: Extend verifier and bpf_sock tests for dst_port loads	Jakub Sitnicki	2	-4/+80
	commit 8f50f16ff39dd4e2d43d1548ca66925652f8aff7 upstream. Add coverage to the verifier tests and tests for reading bpf_sock fields to ensure that 32-bit, 16-bit, and 8-bit loads from dst_port field are allowed only at intended offsets and produce expected values. While 16-bit and 8-bit access to dst_port field is straight-forward, 32-bit wide loads need be allowed and produce a zero-padded 16-bit value for backward compatibility. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/r/20220130115518.213259-3-jakub@cloudflare.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> [OP: backport to 5.4: cherry-pick verifier changes only] Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-03	perf symbol: Correct address for bss symbols	Leo Yan	1	-4/+41
	[ Upstream commit 2d86612aacb7805f72873691a2644d7279ed0630 ] When using 'perf mem' and 'perf c2c', an issue is observed that tool reports the wrong offset for global data symbols. This is a common issue on both x86 and Arm64 platforms. Let's see an example, for a test program, below is the disassembly for its .bss section which is dumped with objdump: ... Disassembly of section .bss: 0000000000004040 <completed.0>: ... 0000000000004080 <buf1>: ... 00000000000040c0 <buf2>: ... 0000000000004100 <thread>: ... First we used 'perf mem record' to run the test program and then used 'perf --debug verbose=4 mem report' to observe what's the symbol info for 'buf1' and 'buf2' structures. # ./perf mem record -e ldlat-loads,ldlat-stores -- false_sharing.exe 8 # ./perf --debug verbose=4 mem report ... dso__load_sym_internal: adjusting symbol: st_value: 0x40c0 sh_addr: 0x4040 sh_offset: 0x3028 symbol__new: buf2 0x30a8-0x30e8 ... dso__load_sym_internal: adjusting symbol: st_value: 0x4080 sh_addr: 0x4040 sh_offset: 0x3028 symbol__new: buf1 0x3068-0x30a8 ... The perf tool relies on libelf to parse symbols, in executable and shared object files, 'st_value' holds a virtual address; 'sh_addr' is the address at which section's first byte should reside in memory, and 'sh_offset' is the byte offset from the beginning of the file to the first byte in the section. The perf tool uses below formula to convert a symbol's memory address to a file address: file_address = st_value - sh_addr + sh_offset ^ ` Memory address We can see the final adjusted address ranges for buf1 and buf2 are [0x30a8-0x30e8) and [0x3068-0x30a8) respectively, apparently this is incorrect, in the code, the structure for 'buf1' and 'buf2' specifies compiler attribute with 64-byte alignment. The problem happens for 'sh_offset', libelf returns it as 0x3028 which is not 64-byte aligned, combining with disassembly, it's likely libelf doesn't respect the alignment for .bss section, therefore, it doesn't return the aligned value for 'sh_offset'. Suggested by Fangrui Song, ELF file contains program header which contains PT_LOAD segments, the fields p_vaddr and p_offset in PT_LOAD segments contain the execution info. A better choice for converting memory address to file address is using the formula: file_address = st_value - p_vaddr + p_offset This patch introduces elf_read_program_header() which returns the program header based on the passed 'st_value', then it uses the formula above to calculate the symbol file address; and the debugging log is updated respectively. After applying the change: # ./perf --debug verbose=4 mem report ... dso__load_sym_internal: adjusting symbol: st_value: 0x40c0 p_vaddr: 0x3d28 p_offset: 0x2d28 symbol__new: buf2 0x30c0-0x3100 ... dso__load_sym_internal: adjusting symbol: st_value: 0x4080 p_vaddr: 0x3d28 p_offset: 0x2d28 symbol__new: buf1 0x3080-0x30c0 ... Fixes: f17e04afaff84b5c ("perf report: Fix ELF symbol parsing") Reported-by: Chang Rui <changruinj@gmail.com> Suggested-by: Fangrui Song <maskray@google.com> Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220724060013.171050-2-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-07-12	selftests: forwarding: fix error message in learning_test	Vladimir Oltean	1	-1/+1
	[ Upstream commit 83844aacab2015da1dba1df0cc61fc4b4c4e8076 ] When packets are not received, they aren't received on $host1_if, so the message talking about the second host not receiving them is incorrect. Fix it. Fixes: d4deb01467ec ("selftests: forwarding: Add a test for FDB learning") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-07-12	selftests: forwarding: fix learning_test when h1 supports IFF_UNICAST_FLT	Vladimir Oltean	1	-0/+2
	[ Upstream commit 1a635d3e1c80626237fdae47a5545b6655d8d81c ] The first host interface has by default no interest in receiving packets MAC DA de:ad:be:ef:13:37, so it might drop them before they hit the tc filter and this might confuse the selftest. Enable promiscuous mode such that the filter properly counts received packets. Fixes: d4deb01467ec ("selftests: forwarding: Add a test for FDB learning") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-07-12	selftests: forwarding: fix flood_unicast_test when h2 supports IFF_UNICAST_FLT	Vladimir Oltean	1	-0/+2
	[ Upstream commit b8e629b05f5d23f9649c901bef09fab8b0c2e4b9 ] As mentioned in the blamed commit, flood_unicast_test() works by checking the match count on a tc filter placed on the receiving interface. But the second host interface (host2_if) has no interest in receiving a packet with MAC DA de:ad:be:ef:13:37, so its RX filter drops it even before the ingress tc filter gets to be executed. So we will incorrectly get the message "Packet was not flooded when should", when in fact, the packet was flooded as expected but dropped due to an unrelated reason, at some other layer on the receiving side. Force h2 to accept this packet by temporarily placing it in promiscuous mode. Alternatively we could either deliver to its MAC address or use tcpdump_start, but this has the fewest complications. This fixes the "flooding" test from bridge_vlan_aware.sh and bridge_vlan_unaware.sh, which calls flood_test from the lib. Fixes: 236dd50bf67a ("selftests: forwarding: Add a test for flooded traffic") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-07-07	selftests/rseq: Change type of rseq_offset to ptrdiff_t	Mathieu Desnoyers	3	-10/+12
	commit 889c5d60fbcf332c8b6ab7054d45f2768914a375 upstream. Just before the 2.35 release of glibc, the __rseq_offset userspace ABI was changed from int to ptrdiff_t. Adapt to this change in the kernel selftests. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://sourceware.org/pipermail/libc-alpha/2022-February/136024.html Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-07	selftests/rseq: x86-32: use %gs segment selector for accessing rseq thread area	Mathieu Desnoyers	1	-32/+34
	commit 127b6429d235ab7c358223bbfd8a8b8d8cc799b6 upstream. Rather than use rseq_get_abi() and pass its result through a register to the inline assembler, directly access the per-thread rseq area through a memory reference combining the %gs segment selector, the constant offset of the field in struct rseq, and the rseq_offset value (in a register). Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220124171253.22072-16-mathieu.desnoyers@efficios.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-07	selftests/rseq: x86-64: use %fs segment selector for accessing rseq thread area	Mathieu Desnoyers	1	-28/+30
	commit 4e15bb766b6c6e963a4d33629034d0ec3b7637df upstream. Rather than use rseq_get_abi() and pass its result through a register to the inline assembler, directly access the per-thread rseq area through a memory reference combining the %fs segment selector, the constant offset of the field in struct rseq, and the rseq_offset value (in a register). Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220124171253.22072-15-mathieu.desnoyers@efficios.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-07	selftests/rseq: Fix: work-around asm goto compiler bugs	Mathieu Desnoyers	7	-6/+245
	commit b53823fb2ef854222853be164f3b1e815f315144 upstream. gcc and clang each have their own compiler bugs with respect to asm goto. Implement a work-around for compiler versions known to have those bugs. gcc prior to 4.8.2 miscompiles asm goto. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670 gcc prior to 8.1.0 miscompiles asm goto at O1. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103908 clang prior to version 13.0.1 miscompiles asm goto at O2. https://github.com/llvm/llvm-project/issues/52735 Work around these issues by adding a volatile inline asm with memory clobber in the fallthrough after the asm goto and at each label target. Emit this for all compilers in case other similar issues are found in the future. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220124171253.22072-14-mathieu.desnoyers@efficios.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-07	selftests/rseq: Remove arm/mips asm goto compiler work-around	Mathieu Desnoyers	2	-74/+0
	commit 94c5cf2a0e193afffef8de48ddc42de6df7cac93 upstream. The arm and mips work-around for asm goto size guess issues are not properly documented, and lack reference to specific compiler versions, upstream compiler bug tracker entry, and reproducer. I can only find a loosely documented patch in my original LKML rseq post refering to gcc < 7 on ARM, but it does not appear to be sufficient to track the exact issue. Also, I am not sure MIPS really has the same limitation. Therefore, remove the work-around until we can properly document this. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/lkml/20171121141900.18471-17-mathieu.desnoyers@efficios.com/ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-07	selftests/rseq: Fix warnings about #if checks of undefined tokens	Mathieu Desnoyers	2	-2/+2
	commit d7ed99ade3e62b755584eea07b4e499e79240527 upstream. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220124171253.22072-12-mathieu.desnoyers@efficios.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-07	selftests/rseq: Fix ppc32 offsets by using long rather than off_t	Mathieu Desnoyers	9	-11/+11
	commit 26dc8a6d8e11552f3b797b5aafe01071ca32d692 upstream. The semantic of off_t is for file offsets. We mean to use it as an offset from a pointer. We really expect it to fit in a single register, and not use a 64-bit type on 32-bit architectures. Fix runtime issues on ppc32 where the offset is always 0 due to inconsistency between the argument type (off_t -> 64-bit) and type expected by the inline assembler (32-bit). Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220124171253.22072-11-mathieu.desnoyers@efficios.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-07	selftests/rseq: Fix ppc32 missing instruction selection "u" and "x" for ↵	Mathieu Desnoyers	1	-27/+28
	load/store commit de6b52a21420a18dc8a36438d581efd1313d5fe3 upstream. Building the rseq basic test with gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.12) Target: powerpc-linux-gnu leads to these errors: /tmp/ccieEWxU.s: Assembler messages: /tmp/ccieEWxU.s:118: Error: syntax error; found `,', expected `(' /tmp/ccieEWxU.s:118: Error: junk at end of line: `,8' /tmp/ccieEWxU.s:121: Error: syntax error; found `,', expected `(' /tmp/ccieEWxU.s:121: Error: junk at end of line: `,8' /tmp/ccieEWxU.s:626: Error: syntax error; found `,', expected `(' /tmp/ccieEWxU.s:626: Error: junk at end of line: `,8' /tmp/ccieEWxU.s:629: Error: syntax error; found `,', expected `(' /tmp/ccieEWxU.s:629: Error: junk at end of line: `,8' /tmp/ccieEWxU.s:735: Error: syntax error; found `,', expected `(' /tmp/ccieEWxU.s:735: Error: junk at end of line: `,8' /tmp/ccieEWxU.s:738: Error: syntax error; found `,', expected `(' /tmp/ccieEWxU.s:738: Error: junk at end of line: `,8' /tmp/ccieEWxU.s:741: Error: syntax error; found `,', expected `(' /tmp/ccieEWxU.s:741: Error: junk at end of line: `,8' Makefile:581: recipe for target 'basic_percpu_ops_test.o' failed Based on discussion with Linux powerpc maintainers and review of the use of the "m" operand in powerpc kernel code, add the missing %Un%Xn (where n is operand number) to the lwz, stw, ld, and std instructions when used with "m" operands. Using "WORD" to mean either a 32-bit or 64-bit type depending on the architecture is misleading. The term "WORD" really means a 32-bit type in both 32-bit and 64-bit powerpc assembler. The intent here is to wrap load/store to intptr_t into common macros for both 32-bit and 64-bit. Rename the macros with a RSEQ_ prefix, and use the terms "INT" for always 32-bit type, and "LONG" for architecture bitness-sized type. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220124171253.22072-10-mathieu.desnoyers@efficios.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-07	selftests/rseq: Fix ppc32: wrong rseq_cs 32-bit field pointer on big endian	Mathieu Desnoyers	5	-38/+38
	commit 24d1136a29da5953de5c0cbc6c83eb62a1e0bf14 upstream. ppc32 incorrectly uses padding as rseq_cs pointer field. Fix this by using the rseq_cs.arch.ptr field. Use this field across all architectures. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220124171253.22072-9-mathieu.desnoyers@efficios.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-07	selftests/rseq: Uplift rseq selftests for compatibility with glibc-2.35	Mathieu Desnoyers	3	-88/+88
	commit 233e667e1ae3e348686bd9dd0172e62a09d852e1 upstream. glibc-2.35 (upcoming release date 2022-02-01) exposes the rseq per-thread data in the TCB, accessible at an offset from the thread pointer, rather than through an actual Thread-Local Storage (TLS) variable, as the Linux kernel selftests initially expected. The __rseq_abi TLS and glibc-2.35's ABI for per-thread data cannot actively coexist in a process, because the kernel supports only a single rseq registration per thread. Here is the scheme introduced to ensure selftests can work both with an older glibc and with glibc-2.35+: - librseq exposes its own "rseq_offset, rseq_size, rseq_flags" ABI. - librseq queries for glibc rseq ABI (__rseq_offset, __rseq_size, __rseq_flags) using dlsym() in a librseq library constructor. If those are found, copy their values into rseq_offset, rseq_size, and rseq_flags. - Else, if those glibc symbols are not found, handle rseq registration from librseq and use its own IE-model TLS to implement the rseq ABI per-thread storage. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220124171253.22072-8-mathieu.desnoyers@efficios.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-07	selftests/rseq: Introduce thread pointer getters	Mathieu Desnoyers	4	-0/+114
	commit 886ddfba933f5ce9d76c278165d834d114ba4ffc upstream. This is done in preparation for the selftest uplift to become compatible with glibc-2.35. glibc-2.35 exposes the rseq per-thread data in the TCB, accessible at an offset from the thread pointer. The toolchains do not implement accessing the thread pointer on all architectures. Provide thread pointer getters for ppc and x86 which lack (or lacked until recently) toolchain support. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220124171253.22072-7-mathieu.desnoyers@efficios.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-07	selftests/rseq: Introduce rseq_get_abi() helper	Mathieu Desnoyers	7	-94/+99
	commit e546cd48ccc456074ddb8920732aef4af65d7ca7 upstream. This is done in preparation for the selftest uplift to become compatible with glibc-2.35. glibc-2.35 exposes the rseq per-thread data in the TCB, accessible at an offset from the thread pointer, rather than through an actual Thread-Local Storage (TLS) variable, as the kernel selftests initially expected. Introduce a rseq_get_abi() helper, initially using the __rseq_abi TLS variable, in preparation for changing this userspace ABI for one which is compatible with glibc-2.35. Note that the __rseq_abi TLS and glibc-2.35's ABI for per-thread data cannot actively coexist in a process, because the kernel supports only a single rseq registration per thread. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220124171253.22072-6-mathieu.desnoyers@efficios.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-07	selftests/rseq: Remove volatile from __rseq_abi	Mathieu Desnoyers	2	-4/+4
	commit 94b80a19ebfe347a01301d750040a61c38200e2b upstream. This is done in preparation for the selftest uplift to become compatible with glibc-2.35. All accesses to the __rseq_abi fields are volatile, but remove the volatile from the TLS variable declaration, otherwise we are stuck with volatile for the upcoming rseq_get_abi() helper. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220124171253.22072-5-mathieu.desnoyers@efficios.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-07	selftests/rseq: Remove useless assignment to cpu variable	Mathieu Desnoyers	1	-3/+1
	commit 930378d056eac2c96407b02aafe4938d0ac9cc37 upstream. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220124171253.22072-4-mathieu.desnoyers@efficios.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-07	selftests/rseq: introduce own copy of rseq uapi header	Mathieu Desnoyers	3	-14/+161
	commit 5c105d55a9dc9e01535116ccfc26e703168a574f upstream. The Linux kernel rseq uapi header has a broken layout for the rseq_cs.ptr field on 32-bit little endian architectures. The entire rseq_cs.ptr field is planned for removal, leaving only the 64-bit rseq_cs.ptr64 field available. Both glibc and librseq use their own copy of the Linux kernel uapi header, where they introduce proper union fields to access to the 32-bit low order bits of the rseq_cs pointer on 32-bit architectures. Introduce a copy of the Linux kernel uapi headers in the Linux kernel selftests. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220124171253.22072-2-mathieu.desnoyers@efficios.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-07	selftests/rseq: remove ARRAY_SIZE define from individual tests	Shuah Khan	2	-4/+2
	commit 07ad4f7629d4802ff0d962b0ac23ea6445964e2a upstream. ARRAY_SIZE is defined in several selftests. Remove definitions from individual test files and include header file for the define instead. ARRAY_SIZE define is added in a separate patch to prepare for this change. Remove ARRAY_SIZE from rseq tests and pickup the one defined in kselftest.h. Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-07	rseq/selftests,x86_64: Add rseq_offset_deref_addv()	Peter Oskolkov	1	-0/+57
	commit ea366dd79c05fcd4cf5e225d2de8a3a7c293160c upstream. This patch adds rseq_offset_deref_addv() function to tools/testing/selftests/rseq/rseq-x86.h, to be used in a selftest in the next patch in the patchset. Once an architecture adds support for this function they should define "RSEQ_ARCH_HAS_OFFSET_DEREF_ADDV". Signed-off-by: Peter Oskolkov <posk@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lkml.kernel.org/r/20200923233618.2572849-2-posk@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-07	selftests/net: pass ipv6_args to udpgso_bench's IPv6 TCP test	Dimitris Michailidis	1	-1/+1
	commit b968080808f7f28b89aa495b7402ba48eb17ee93 upstream. udpgso_bench.sh has been running its IPv6 TCP test with IPv4 arguments since its initial conmit. Looks like a typo. Fixes: 3a687bef148d ("selftests: udp gso benchmark") Cc: willemb@google.com Signed-off-by: Dimitris Michailidis <dmichail@fungible.com> Acked-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20220623000234.61774-1-dmichail@fungible.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-14	netfilter: nat: really support inet nat without l3 address	Florian Westphal	1	-0/+43
	[ Upstream commit 282e5f8fe907dc3f2fbf9f2103b0e62ffc3a68a5 ] When no l3 address is given, priv->family is set to NFPROTO_INET and the evaluation function isn't called. Call it too so l4-only rewrite can work. Also add a test case for this. Fixes: a33f387ecd5aa ("netfilter: nft_nat: allow to specify layer 4 protocol NAT only") Reported-by: Yi Chen <yiche@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14	perf c2c: Fix sorting in percent_rmt_hitm_cmp()	Leo Yan	1	-2/+2
	[ Upstream commit b24192a17337abbf3f44aaa75e15df14a2d0016e ] The function percent_rmt_hitm_cmp() wrongly uses local HITMs for sorting remote HITMs. Since this function is to sort cache lines for remote HITMs, this patch changes to use 'rmt_hitm' field for correct sorting. Fixes: 9cb3500afc0980c5 ("perf c2c report: Add hitm/store percent related sort keys") Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Joe Mario <jmario@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220530084253.750190-1-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14	perf jevents: Fix event syntax error caused by ExtSel	Zhengjun Xing	1	-1/+1
	[ Upstream commit f4df0dbbe62ee8e4405a57b27ccd54393971c773 ] In the origin code, when "ExtSel" is 1, the eventcode will change to "eventcode \|= 1 << 21”. For event “UNC_Q_RxL_CREDITS_CONSUMED_VN0.DRS", its "ExtSel" is "1", its eventcode will change from 0x1E to 0x20001E, but in fact the eventcode should <=0x1FF, so this will cause the parse fail: # perf stat -e "UNC_Q_RxL_CREDITS_CONSUMED_VN0.DRS" -a sleep 0.1 event syntax error: '.._RxL_CREDITS_CONSUMED_VN0.DRS' \___ value too big for format, maximum is 511 On the perf kernel side, the kernel assumes the valid bits are continuous. It will adjust the 0x100 (bit 8 for perf tool) to bit 21 in HW. DEFINE_UNCORE_FORMAT_ATTR(event_ext, event, "config:0-7,21"); So the perf tool follows the kernel side and just set bit8 other than bit21. Fixes: fedb2b518239cbc0 ("perf jevents: Add support for parsing uncore json files") Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220525140410.1706851-1-zhengjun.xing@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14	perf c2c: Use stdio interface if slang is not supported	Leo Yan	1	-2/+4
	[ Upstream commit c4040212bc97d16040712a410335f93bc94d2262 ] If the slang lib is not installed on the system, perf c2c tool disables TUI mode and roll back to use stdio mode; but the flag 'c2c.use_stdio' is missed to set true and thus it wrongly applies UI quirks in the function ui_quirks(). This commit forces to use stdio interface if slang is not supported, and it can avoid to apply the UI quirks and show the correct metric header. Before: ================================================= Shared Cache Line Distribution Pareto ================================================= ------------------------------------------------------------------------------- 0 0 0 99 0 0 0 0xaaaac17d6000 ------------------------------------------------------------------------------- 0.00% 0.00% 6.06% 0.00% 0.00% 0.00% 0x20 N/A 0 0xaaaac17c25ac 0 0 43 375 18469 2 [.] 0x00000000000025ac memstress memstress[25ac] 0 0.00% 0.00% 93.94% 0.00% 0.00% 0.00% 0x29 N/A 0 0xaaaac17c3e88 0 0 173 180 135 2 [.] 0x0000000000003e88 memstress memstress[3e88] 0 After: ================================================= Shared Cache Line Distribution Pareto ================================================= ------------------------------------------------------------------------------- 0 0 0 99 0 0 0 0xaaaac17d6000 ------------------------------------------------------------------------------- 0.00% 0.00% 6.06% 0.00% 0.00% 0.00% 0x20 N/A 0 0xaaaac17c25ac 0 0 43 375 18469 2 [.] 0x00000000000025ac memstress memstress[25ac] 0 0.00% 0.00% 93.94% 0.00% 0.00% 0.00% 0x29 N/A 0 0xaaaac17c3e88 0 0 173 180 135 2 [.] 0x0000000000003e88 memstress memstress[3e88] 0 Fixes: 5a1a99cd2e4e1557 ("perf c2c report: Add main TUI browser") Reported-by: Joe Mario <jmario@redhat.com> Signed-off-by: Leo Yan <leo.yan@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20220526145400.611249-1-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14	perf tools: Add missing headers needed by util/data.h	Yang Jihong	1	-0/+1
	[ Upstream commit 4d27cf1d9de5becfa4d1efb2ea54dba1b9fc962a ] 'struct perf_data' in util/data.h uses the "u64" data type, which is defined in "linux/types.h". If we only include util/data.h, the following compilation error occurs: util/data.h:38:3: error: unknown type name ‘u64’ u64 version; ^~~ Solution: include "linux/types.h." to add the needed type definitions. Fixes: 258031c017c353e8 ("perf header: Add DIR_FORMAT feature to describe directory data") Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220429090539.212448-1-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14	selftests/bpf: fix btf_dump/btf_dump due to recent clang change	Yonghong Song	1	-1/+1
	[ Upstream commit 4050764cbaa25760aab40857f723393c07898474 ] Latest llvm-project upstream had a change of behavior related to qualifiers on function return type ([1]). This caused selftests btf_dump/btf_dump failure. The following example shows what changed. $ cat t.c typedef const char * const (* const (* const fn_ptr_arr2_t[5])())(char * ()(int)); struct t { int a; fn_ptr_arr2_t l; }; int foo(struct t arg) { return arg->a; } Compiled with latest upstream llvm15, $ clang -O2 -g -target bpf -S -emit-llvm t.c The related generated debuginfo IR looks like: !16 = !DIDerivedType(tag: DW_TAG_typedef, name: "fn_ptr_arr2_t", file: !1, line: 1, baseType: !17) !17 = !DICompositeType(tag: DW_TAG_array_type, baseType: !18, size: 320, elements: !32) !18 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !19) !19 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !20, size: 64) !20 = !DISubroutineType(types: !21) !21 = !{!22, null} !22 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !23, size: 64) !23 = !DISubroutineType(types: !24) !24 = !{!25, !28} !25 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !26, size: 64) !26 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !27) !27 = !DIBasicType(name: "char", size: 8, encoding: DW_ATE_signed_char) You can see two intermediate const qualifier to pointer are dropped in debuginfo IR. With llvm14, we have following debuginfo IR: !16 = !DIDerivedType(tag: DW_TAG_typedef, name: "fn_ptr_arr2_t", file: !1, line: 1, baseType: !17) !17 = !DICompositeType(tag: DW_TAG_array_type, baseType: !18, size: 320, elements: !34) !18 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !19) !19 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !20, size: 64) !20 = !DISubroutineType(types: !21) !21 = !{!22, null} !22 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !23) !23 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !24, size: 64) !24 = !DISubroutineType(types: !25) !25 = !{!26, !30} !26 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !27) !27 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !28, size: 64) !28 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !29) !29 = !DIBasicType(name: "char", size: 8, encoding: DW_ATE_signed_char) All const qualifiers are preserved. To adapt the selftest to both old and new llvm, this patch removed the intermediate const qualifier in const-to-ptr types, to make the test succeed again. [1] https://reviews.llvm.org/D125919 Reported-by: Mykola Lysenko <mykolal@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20220523152044.3905809-1-yhs@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14	tools/power turbostat: fix ICX DRAM power numbers	Len Brown	1	-0/+1
	[ Upstream commit 6397b6418935773a34b533b3348b03f4ce3d7050 ] ICX (and its duplicates) require special hard-coded DRAM RAPL units, rather than using the generic RAPL energy units. Reported-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-25	x86/xen: Mark cpu_bringup_and_idle() as dead_end_function	Peter Zijlstra	1	-0/+1
	commit 9af9dcf11bda3e2c0e24c1acaacb8685ad974e93 upstream. The asm_cpu_bringup_and_idle() function is required to push the return value on the stack in order to make ORC happy, but the only reason objtool doesn't complain is because of a happy accident. The thing is that asm_cpu_bringup_and_idle() doesn't return, so validate_branch() never terminates and falls through to the next function, which in the normal case is the hypercall_page. And that, as it happens, is 4095 NOPs and a RET. Make asm_cpu_bringup_and_idle() terminate on it's own, by making the function it calls as a dead-end. This way we no longer rely on what code happens to come after. Fixes: c3881eb58d56 ("x86/xen: Make the secondary CPU idle tasks reliable") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Juergen Gross <jgross@suse.com> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Link: https://lore.kernel.org/r/20210624095147.693801717@infradead.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-25	selftests: add ping test with ping_group_range tuned	Nicolas Dichtel	1	-0/+12
	[ Upstream commit e71b7f1f44d3d88c677769c85ef0171caf9fc89f ] The 'ping' utility is able to manage two kind of sockets (raw or icmp), depending on the sysctl ping_group_range. By default, ping_group_range is set to '1 0', which forces ping to use an ip raw socket. Let's replay the ping tests by allowing 'ping' to use the ip icmp socket. After the previous patch, ipv4 tests results are the same with both kinds of socket. For ipv6, there are a lot a new failures (the previous patch fixes only two cases). Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-25	perf bench numa: Address compiler error on s390	Thomas Richter	1	-1/+1
	[ Upstream commit f8ac1c478424a9a14669b8cef7389b1e14e5229d ] The compilation on s390 results in this error: # make DEBUG=y bench/numa.o ... bench/numa.c: In function ‘__bench_numa’: bench/numa.c:1749:81: error: ‘%d’ directive output may be truncated writing between 1 and 11 bytes into a region of size between 10 and 20 [-Werror=format-truncation=] 1749 \| snprintf(tname, sizeof(tname), "process%d:thread%d", p, t); ^~ ... bench/numa.c:1749:64: note: directive argument in the range [-2147483647, 2147483646] ... # The maximum length of the %d replacement is 11 characters because of the negative sign. Therefore extend the array by two more characters. Output after: # make DEBUG=y bench/numa.o > /dev/null 2>&1; ll bench/numa.o -rw-r--r-- 1 root root 418320 May 19 09:11 bench/numa.o # Fixes: 3aff8ba0a4c9c919 ("perf bench numa: Avoid possible truncation when using snprintf()") Suggested-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Link: https://lore.kernel.org/r/20220520081158.2990006-1-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-15	x86: xen: insn: Decode Xen and KVM emulate-prefix signature	Masami Hiramatsu	5	-2/+58
	commit 4d65adfcd1196818659d3bd9b42dccab291e1751 upstream. Decode Xen and KVM's emulate-prefix signature by x86 insn decoder. It is called "prefix" but actually not x86 instruction prefix, so this adds insn.emulate_prefix_size field instead of reusing insn.prefixes. If x86 decoder finds a special sequence of instructions of XEN_EMULATE_PREFIX and 'ud2a; .ascii "kvm"', it just counts the length, set insn.emulate_prefix_size and fold it with the next instruction. In other words, the signature and the next instruction is treated as a single instruction. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: x86@kernel.org Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Borislav Petkov <bp@alien8.de> Cc: xen-devel@lists.xenproject.org Cc: Randy Dunlap <rdunlap@infradead.org> Link: https://lkml.kernel.org/r/156777564986.25081.4964537658500952557.stgit@devnote2 [mheyne: resolved contextual conflict in tools/objtools/sync-check.sh] Signed-off-by: Maximilian Heyne <mheyne@amazon.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>