summaryrefslogtreecommitdiff
path: root/tools
AgeCommit message (Collapse)AuthorFilesLines
2025-06-27bpf: Fix L4 csum update on IPv6 in CHECKSUM_COMPLETEPaul Chaignon1-0/+2
commit ead7f9b8de65632ef8060b84b0c55049a33cfea1 upstream. In Cilium, we use bpf_csum_diff + bpf_l4_csum_replace to, among other things, update the L4 checksum after reverse SNATing IPv6 packets. That use case is however not currently supported and leads to invalid skb->csum values in some cases. This patch adds support for IPv6 address changes in bpf_l4_csum_update via a new flag. When calling bpf_l4_csum_replace in Cilium, it ends up calling inet_proto_csum_replace_by_diff: 1: void inet_proto_csum_replace_by_diff(__sum16 *sum, struct sk_buff *skb, 2: __wsum diff, bool pseudohdr) 3: { 4: if (skb->ip_summed != CHECKSUM_PARTIAL) { 5: csum_replace_by_diff(sum, diff); 6: if (skb->ip_summed == CHECKSUM_COMPLETE && pseudohdr) 7: skb->csum = ~csum_sub(diff, skb->csum); 8: } else if (pseudohdr) { 9: *sum = ~csum_fold(csum_add(diff, csum_unfold(*sum))); 10: } 11: } The bug happens when we're in the CHECKSUM_COMPLETE state. We've just updated one of the IPv6 addresses. The helper now updates the L4 header checksum on line 5. Next, it updates skb->csum on line 7. It shouldn't. For an IPv6 packet, the updates of the IPv6 address and of the L4 checksum will cancel each other. The checksums are set such that computing a checksum over the packet including its checksum will result in a sum of 0. So the same is true here when we update the L4 checksum on line 5. We'll update it as to cancel the previous IPv6 address update. Hence skb->csum should remain untouched in this case. The same bug doesn't affect IPv4 packets because, in that case, three fields are updated: the IPv4 address, the IP checksum, and the L4 checksum. The change to the IPv4 address and one of the checksums still cancel each other in skb->csum, but we're left with one checksum update and should therefore update skb->csum accordingly. That's exactly what inet_proto_csum_replace_by_diff does. This special case for IPv6 L4 checksums is also described atop inet_proto_csum_replace16, the function we should be using in this case. This patch introduces a new bpf_l4_csum_replace flag, BPF_F_IPV6, to indicate that we're updating the L4 checksum of an IPv6 packet. When the flag is set, inet_proto_csum_replace_by_diff will skip the skb->csum update. Fixes: 7d672345ed295 ("bpf: add generic bpf_csum_diff helper") Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://patch.msgid.link/96a6bc3a443e6f0b21ff7b7834000e17fb549e05.1748509484.git.paul.chaignon@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-27selftests/x86: Add a test to detect infinite SIGTRAP handler loopXin Li (Intel)2-1/+102
commit f287822688eeb44ae1cf6ac45701d965efc33218 upstream. When FRED is enabled, if the Trap Flag (TF) is set without an external debugger attached, it can lead to an infinite loop in the SIGTRAP handler. To avoid this, the software event flag in the augmented SS must be cleared, ensuring that no single-step trap remains pending when ERETU completes. This test checks for that specific scenario—verifying whether the kernel correctly prevents an infinite SIGTRAP loop in this edge case when FRED is enabled. The test should _always_ pass with IDT event delivery, thus no need to disable the test even when FRED is not enabled. Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Cc:stable@vger.kernel.org Link: https://lore.kernel.org/all/20250609084054.2083189-3-xin%40zytor.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-27libbpf: Add identical pointer detection to btf_dedup_is_equiv()Alan Maguire1-0/+16
[ Upstream commit 8e64c387c942229c551d0f23de4d9993d3a2acb6 ] Recently as a side-effect of commit ac053946f5c4 ("compiler.h: introduce TYPEOF_UNQUAL() macro") issues were observed in deduplication between modules and kernel BTF such that a large number of kernel types were not deduplicated so were found in module BTF (task_struct, bpf_prog etc). The root cause appeared to be a failure to dedup struct types, specifically those with members that were pointers with __percpu annotations. The issue in dedup is at the point that we are deduplicating structures, we have not yet deduplicated reference types like pointers. If multiple copies of a pointer point at the same (deduplicated) integer as in this case, we do not see them as identical. Special handling already exists to deal with structures and arrays, so add pointer handling here too. Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250429161042.2069678-1-alan.maguire@oracle.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-27tools/resolve_btfids: Fix build when cross compiling kernel with clang.Suleiman Souhlal1-1/+1
commit a298bbab903e3fb4cbe16d36d6195e68fad1b776 upstream. When cross compiling the kernel with clang, we need to override CLANG_CROSS_FLAGS when preparing the step libraries. Prior to commit d1d096312176 ("tools: fix annoying "mkdir -p ..." logs when building tools in parallel"), MAKEFLAGS would have been set to a value that wouldn't set a value for CLANG_CROSS_FLAGS, hiding the fact that we weren't properly overriding it. Fixes: 56a2df7615fa ("tools/resolve_btfids: Compile resolve_btfids as host program") Signed-off-by: Suleiman Souhlal <suleiman@google.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/bpf/20250606074538.1608546-1-suleiman@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-27perf record: Fix incorrect --user-regs commentsDapeng Mi1-1/+1
[ Upstream commit a4a859eb6704a8aa46aa1cec5396c8d41383a26b ] The comment of "--user-regs" option is not correct, fix it. "on interrupt," -> "in user space," Fixes: 84c417422798c897 ("perf record: Support direct --user-regs arguments") Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250403060810.196028-1-dapeng1.mi@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-27perf tests switch-tracking: Fix timestamp comparisonLeo Yan1-1/+1
[ Upstream commit 628e124404b3db5e10e17228e680a2999018ab33 ] The test might fail on the Arm64 platform with the error: # perf test -vvv "Track with sched_switch" Missing sched_switch events # The issue is caused by incorrect handling of timestamp comparisons. The comparison result, a signed 64-bit value, was being directly cast to an int, leading to incorrect sorting for sched events. The case does not fail everytime, usually I can trigger the failure after run 20 ~ 30 times: # while true; do perf test "Track with sched_switch"; done 106: Track with sched_switch : Ok 106: Track with sched_switch : Ok 106: Track with sched_switch : Ok 106: Track with sched_switch : Ok 106: Track with sched_switch : Ok 106: Track with sched_switch : Ok 106: Track with sched_switch : Ok 106: Track with sched_switch : Ok 106: Track with sched_switch : Ok 106: Track with sched_switch : Ok 106: Track with sched_switch : Ok 106: Track with sched_switch : Ok 106: Track with sched_switch : Ok 106: Track with sched_switch : Ok 106: Track with sched_switch : FAILED! 106: Track with sched_switch : Ok 106: Track with sched_switch : Ok 106: Track with sched_switch : Ok 106: Track with sched_switch : Ok 106: Track with sched_switch : Ok 106: Track with sched_switch : Ok 106: Track with sched_switch : Ok 106: Track with sched_switch : Ok 106: Track with sched_switch : FAILED! 106: Track with sched_switch : Ok 106: Track with sched_switch : Ok I used cross compiler to build Perf tool on my host machine and tested on Debian / Juno board. Generally, I think this issue is not very specific to GCC versions. As both internal CI and my local env can reproduce the issue. My Host Build compiler: # aarch64-linux-gnu-gcc --version aarch64-linux-gnu-gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 Juno Board: # lsb_release -a No LSB modules are available. Distributor ID: Debian Description: Debian GNU/Linux 12 (bookworm) Release: 12 Codename: bookworm Fix this by explicitly returning 0, 1, or -1 based on whether the result is zero, positive, or negative. Fixes: d44bc558297222d9 ("perf tests: Add a test for tracking with sched_switch") Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Leo Yan <leo.yan@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: James Clark <james.clark@linaro.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20250331172759.115604-1-leo.yan@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-27perf scripts python: exported-sql-viewer.py: Fix pattern matching with Python 3Adrian Hunter1-1/+4
[ Upstream commit 17e548405a81665fd14cee960db7d093d1396400 ] The script allows the user to enter patterns to find symbols. The pattern matching characters are converted for use in SQL. For PostgreSQL the conversion involves using the Python maketrans() method which is slightly different in Python 3 compared with Python 2. Fix to work in Python 3. Fixes: beda0e725e5f06ac ("perf script python: Add Python3 support to exported-sql-viewer.py") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Tony Jones <tonyj@suse.de> Link: https://lore.kernel.org/r/20250512093932.79854-4-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-27perf intel-pt: Fix PEBS-via-PT data_srcAdrian Hunter1-3/+202
[ Upstream commit e00eac6b5b6d956f38d8880c44bf7fd9954063c3 ] The Fixes commit did not add support for decoding PEBS-via-PT data_src. Fix by adding support. PEBS-via-PT is a feature of some E-core processors, starting with processors based on Tremont microarchitecture. Because the kernel only supports Intel PT features that are on all processors, there is no support for PEBS-via-PT on hybrids. Currently that leaves processors based on Tremont, Gracemont and Crestmont, however there are no events on Tremont that produce data_src information, and for Gracemont and Crestmont there are only: mem-loads event=0xd0,umask=0x5,ldlat=3 mem-stores event=0xd0,umask=0x6 Affected processors include Alder Lake N (Gracemont), Sierra Forest (Crestmont) and Grand Ridge (Crestmont). Example: # perf record -d -e intel_pt/branch=0/ -e mem-loads/aux-output/pp uname Before: # perf.before script --itrace=o -Fdata_src 0 |OP No|LVL N/A|SNP N/A|TLB N/A|LCK No|BLK N/A 0 |OP No|LVL N/A|SNP N/A|TLB N/A|LCK No|BLK N/A After: # perf script --itrace=o -Fdata_src 10268100142 |OP LOAD|LVL L1 hit|SNP None|TLB L1 or L2 hit|LCK No|BLK N/A 10450100442 |OP LOAD|LVL L2 hit|SNP None|TLB L2 miss|LCK No|BLK N/A Fixes: 975846eddf907297 ("perf intel-pt: Add memory information to synthesized PEBS sample") Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20250512093932.79854-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-27perf ui browser hists: Set actions->thread before calling do_zoom_thread()Arnaldo Carvalho de Melo1-1/+1
[ Upstream commit 1741189d843a1d5ef38538bc52a3760e2e46cb2e ] In 7cecb7fe8388d5c3 ("perf hists: Move sort__has_comm into struct perf_hpp_list") it assumes that act->thread is set prior to calling do_zoom_thread(). This doesn't happen when we use ESC or the Left arrow key to Zoom out of a specific thread, making this operation not to work and we get stuck into the thread zoom. In 6422184b087ff435 ("perf hists browser: Simplify zooming code using pstack_peek()") it says no need to set actions->thread, and at that point that was true, but in 7cecb7fe8388d5c3 a actions->thread == NULL check was added before the zoom out of thread could kick in. We can zoom out using the alternative 't' thread zoom toggle hotkey to finally set actions->thread before calling do_zoom_thread() and zoom out, but lets also fix the ESC/Zoom out of thread case. Fixes: 7cecb7fe8388d5c3 ("perf hists: Move sort__has_comm into struct perf_hpp_list") Reported-by: Ingo Molnar <mingo@kernel.org> Tested-by: Ingo Molnar <mingo@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/Z_TYux5fUg2pW-pF@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-27perf build: Warn when libdebuginfod devel files are not availableArnaldo Carvalho de Melo1-0/+2
[ Upstream commit 4fce4b91fd1aabb326c46e237eb4b19ab72598f8 ] While working on 'perf version --build-options' I noticed that: $ perf version --build-options perf version 6.15.rc1.g312a07a00d31 aio: [ on ] # HAVE_AIO_SUPPORT bpf: [ on ] # HAVE_LIBBPF_SUPPORT bpf_skeletons: [ on ] # HAVE_BPF_SKEL debuginfod: [ OFF ] # HAVE_DEBUGINFOD_SUPPORT <SNIP> And looking at tools/perf/Makefile.config I also noticed that it is not opt-in, meaning we will attempt to build with it in all normal cases. So add the usual warning at build time to let the user know that something recommended is missing, now we see: Makefile.config:563: No elfutils/debuginfod.h found, no debuginfo server support, please install elfutils-debuginfod-client-devel or equivalent And after following the recommendation: $ perf check feature debuginfod debuginfod: [ on ] # HAVE_DEBUGINFOD_SUPPORT $ ldd ~/bin/perf | grep debuginfo libdebuginfod.so.1 => /lib64/libdebuginfod.so.1 (0x00007fee5cf5f000) $ With this feature on several perf tools will fetch what is needed and not require all the contents of the debuginfo packages, for instance: # rpm -qa | grep kernel-debuginfo # pahole --running_kernel_vmlinux pahole: couldn't find a vmlinux that matches the running kernel HINT: Maybe you're inside a container or missing a debuginfo package? # # perf trace -e open* perf probe --vars icmp_rcv 0.000 ( 0.005 ms): perf/97391 openat(dfd: CWD, filename: "/etc/ld.so.cache", flags: RDONLY|CLOEXEC) = 3 0.014 ( 0.004 ms): perf/97391 openat(dfd: CWD, filename: "/lib64/libm.so.6", flags: RDONLY|CLOEXEC) = 3 <SNIP> 32130.100 ( 0.008 ms): perf/97391 openat(dfd: CWD, filename: "/root/.cache/debuginfod_client/aa3c82b4a13f9c0e0301bebb20fe958c4db6f362/debuginfo") = 3 <SNIP> Available variables at icmp_rcv @<icmp_rcv+0> struct sk_buff* skb <SNIP> # # pahole --running_kernel_vmlinux /root/.cache/debuginfod_client/aa3c82b4a13f9c0e0301bebb20fe958c4db6f362/debuginfo # file /root/.cache/debuginfod_client/aa3c82b4a13f9c0e0301bebb20fe958c4db6f362/debuginfo /root/.cache/debuginfod_client/aa3c82b4a13f9c0e0301bebb20fe958c4db6f362/debuginfo: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=aa3c82b4a13f9c0e0301bebb20fe958c4db6f362, with debug_info, not stripped # ls -la /root/.cache/debuginfod_client/aa3c82b4a13f9c0e0301bebb20fe958c4db6f362/debuginfo -r--------. 1 root root 475401512 Mar 27 21:00 /root/.cache/debuginfod_client/aa3c82b4a13f9c0e0301bebb20fe958c4db6f362/debuginfo # Then, cached: # perf stat --null perf probe --vars icmp_rcv Available variables at icmp_rcv @<icmp_rcv+0> struct sk_buff* skb Performance counter stats for 'perf probe --vars icmp_rcv': 0.671389041 seconds time elapsed 0.519176000 seconds user 0.150860000 seconds sys Fixes: c7a14fdcb3fa7736 ("perf build-ids: Fall back to debuginfod query if debuginfo not found") Tested-by: Ingo Molnar <mingo@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frank Ch. Eigler <fche@redhat.com> Link: https://lore.kernel.org/r/Z_dkNDj9EPFwPqq1@gmail.com [ Folded patch from Ingo to have the debian/ubuntu devel package added build warning message ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-27libbpf: Use proper errno value in nlattrAnton Protopopov1-8/+7
[ Upstream commit fd5fd538a1f4b34cee6823ba0ddda2f7a55aca96 ] Return value of the validate_nla() function can be propagated all the way up to users of libbpf API. In case of error this libbpf version of validate_nla returns -1 which will be seen as -EPERM from user's point of view. Instead, return a more reasonable -EINVAL. Fixes: bbf48c18ee0c ("libbpf: add error reporting in XDP") Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250510182011.2246631-1-a.s.protopopov@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-27bpf: Fix uninitialized values in BPF_{CORE,PROBE}_READAnton Protopopov1-0/+6
[ Upstream commit 41d4ce6df3f4945341ec509a840cc002a413b6cc ] With the latest LLVM bpf selftests build will fail with the following error message: progs/profiler.inc.h:710:31: error: default initialization of an object of type 'typeof ((parent_task)->real_cred->uid.val)' (aka 'const unsigned int') leaves the object uninitialized and is incompatible with C++ [-Werror,-Wdefault-const-init-unsafe] 710 | proc_exec_data->parent_uid = BPF_CORE_READ(parent_task, real_cred, uid.val); | ^ tools/testing/selftests/bpf/tools/include/bpf/bpf_core_read.h:520:35: note: expanded from macro 'BPF_CORE_READ' 520 | ___type((src), a, ##__VA_ARGS__) __r; \ | ^ This happens because BPF_CORE_READ (and other macro) declare the variable __r using the ___type macro which can inherit const modifier from intermediate types. Fix this by using __typeof_unqual__, when supported. (And when it is not supported, the problem shouldn't appear, as older compilers haven't complained.) Fixes: 792001f4f7aa ("libbpf: Add user-space variants of BPF_CORE_READ() family of macros") Fixes: a4b09a9ef945 ("libbpf: Add non-CO-RE variants of BPF_CORE_READ() macro family") Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250502193031.3522715-1-a.s.protopopov@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-27libbpf: Use proper errno value in linkerAnton Protopopov1-2/+2
[ Upstream commit 358b1c0f56ebb6996fcec7dcdcf6bae5dcbc8b6c ] Return values of the linker_append_sec_data() and the linker_append_elf_relos() functions are propagated all the way up to users of libbpf API. In some error cases these functions return -1 which will be seen as -EPERM from user's point of view. Instead, return a more reasonable -EINVAL. Fixes: faf6ed321cf6 ("libbpf: Add BPF static linker APIs") Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250430120820.2262053-1-a.s.protopopov@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-27libbpf: Fix buffer overflow in bpf_object__init_progViktor Malik1-1/+1
[ Upstream commit ee684de5c1b0ac01821320826baec7da93f3615b ] As shown in [1], it is possible to corrupt a BPF ELF file such that arbitrary BPF instructions are loaded by libbpf. This can be done by setting a symbol (BPF program) section offset to a large (unsigned) number such that <section start + symbol offset> overflows and points before the section data in the memory. Consider the situation below where: - prog_start = sec_start + symbol_offset <-- size_t overflow here - prog_end = prog_start + prog_size prog_start sec_start prog_end sec_end | | | | v v v v .....................|################################|............ The report in [1] also provides a corrupted BPF ELF which can be used as a reproducer: $ readelf -S crash Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align ... [ 2] uretprobe.mu[...] PROGBITS 0000000000000000 00000040 0000000000000068 0000000000000000 AX 0 0 8 $ readelf -s crash Symbol table '.symtab' contains 8 entries: Num: Value Size Type Bind Vis Ndx Name ... 6: ffffffffffffffb8 104 FUNC GLOBAL DEFAULT 2 handle_tp Here, the handle_tp prog has section offset ffffffffffffffb8, i.e. will point before the actual memory where section 2 is allocated. This is also reported by AddressSanitizer: ================================================================= ==1232==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7c7302fe0000 at pc 0x7fc3046e4b77 bp 0x7ffe64677cd0 sp 0x7ffe64677490 READ of size 104 at 0x7c7302fe0000 thread T0 #0 0x7fc3046e4b76 in memcpy (/lib64/libasan.so.8+0xe4b76) #1 0x00000040df3e in bpf_object__init_prog /src/libbpf/src/libbpf.c:856 #2 0x00000040df3e in bpf_object__add_programs /src/libbpf/src/libbpf.c:928 #3 0x00000040df3e in bpf_object__elf_collect /src/libbpf/src/libbpf.c:3930 #4 0x00000040df3e in bpf_object_open /src/libbpf/src/libbpf.c:8067 #5 0x00000040f176 in bpf_object__open_file /src/libbpf/src/libbpf.c:8090 #6 0x000000400c16 in main /poc/poc.c:8 #7 0x7fc3043d25b4 in __libc_start_call_main (/lib64/libc.so.6+0x35b4) #8 0x7fc3043d2667 in __libc_start_main@@GLIBC_2.34 (/lib64/libc.so.6+0x3667) #9 0x000000400b34 in _start (/poc/poc+0x400b34) 0x7c7302fe0000 is located 64 bytes before 104-byte region [0x7c7302fe0040,0x7c7302fe00a8) allocated by thread T0 here: #0 0x7fc3046e716b in malloc (/lib64/libasan.so.8+0xe716b) #1 0x7fc3045ee600 in __libelf_set_rawdata_wrlock (/lib64/libelf.so.1+0xb600) #2 0x7fc3045ef018 in __elf_getdata_rdlock (/lib64/libelf.so.1+0xc018) #3 0x00000040642f in elf_sec_data /src/libbpf/src/libbpf.c:3740 The problem here is that currently, libbpf only checks that the program end is within the section bounds. There used to be a check `while (sec_off < sec_sz)` in bpf_object__add_programs, however, it was removed by commit 6245947c1b3c ("libbpf: Allow gaps in BPF program sections to support overriden weak functions"). Add a check for detecting the overflow of `sec_off + prog_sz` to bpf_object__init_prog to fix this issue. [1] https://github.com/lmarch2/poc/blob/main/libbpf/libbpf.md Fixes: 6245947c1b3c ("libbpf: Allow gaps in BPF program sections to support overriden weak functions") Reported-by: lmarch2 <2524158037@qq.com> Signed-off-by: Viktor Malik <vmalik@redhat.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Link: https://github.com/lmarch2/poc/blob/main/libbpf/libbpf.md Link: https://lore.kernel.org/bpf/20250415155014.397603-1-vmalik@redhat.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-27selftests/seccomp: fix syscall_restart test for arm compatNeill Kapron1-2/+5
[ Upstream commit 797002deed03491215a352ace891749b39741b69 ] The inconsistencies in the systcall ABI between arm and arm-compat can can cause a failure in the syscall_restart test due to the logic attempting to work around the differences. The 'machine' field for an ARM64 device running in compat mode can report 'armv8l' or 'armv8b' which matches with the string 'arm' when only examining the first three characters of the string. This change adds additional validation to the workaround logic to make sure we only take the arm path when running natively, not in arm-compat. Fixes: 256d0afb11d6 ("selftests/seccomp: build and pass on arm64") Signed-off-by: Neill Kapron <nkapron@google.com> Link: https://lore.kernel.org/r/20250427094103.3488304-2-nkapron@google.com Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-04bpftool: Fix readlink usage in get_fd_typeViktor Malik1-1/+2
[ Upstream commit 0053f7d39d491b6138d7c526876d13885cbb65f1 ] The `readlink(path, buf, sizeof(buf))` call reads at most sizeof(buf) bytes and *does not* append null-terminator to buf. With respect to that, fix two pieces in get_fd_type: 1. Change the truncation check to contain sizeof(buf) rather than sizeof(path). 2. Append null-terminator to buf. Reported by Coverity. Signed-off-by: Viktor Malik <vmalik@redhat.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Quentin Monnet <qmo@kernel.org> Link: https://lore.kernel.org/bpf/20250129071857.75182-1-vmalik@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-04kunit: tool: Use qboot on QEMU x86_64Brendan Jackman1-1/+3
[ Upstream commit 08fafac4c9f289a9d9a22d838921e4b3eb22c664 ] As noted in [0], SeaBIOS (QEMU default) makes a mess of the terminal, qboot does not. It turns out this is actually useful with kunit.py, since the user is exposed to this issue if they set --raw_output=all. qboot is also faster than SeaBIOS, but it's is marginal for this usecase. [0] https://lore.kernel.org/all/CA+i-1C0wYb-gZ8Mwh3WSVpbk-LF-Uo+njVbASJPe1WXDURoV7A@mail.gmail.com/ Both SeaBIOS and qboot are x86-specific. Link: https://lore.kernel.org/r/20250124-kunit-qboot-v1-1-815e4d4c6f7c@google.com Signed-off-by: Brendan Jackman <jackmanb@google.com> Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-04libbpf: Fix out-of-bound readNandakumar Edamana1-1/+1
[ Upstream commit 236d3910117e9f97ebf75e511d8bcc950f1a4e5f ] In `set_kcfg_value_str`, an untrusted string is accessed with the assumption that it will be at least two characters long due to the presence of checks for opening and closing quotes. But the check for the closing quote (value[len - 1] != '"') misses the fact that it could be checking the opening quote itself in case of an invalid input that consists of just the opening quote. This commit adds an explicit check to make sure the string is at least two characters long. Signed-off-by: Nandakumar Edamana <nandakumar@nandakumar.co.in> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250221210110.3182084-1-nandakumar@nandakumar.co.in Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-04selftests/net: have `gro.sh -t` return a correct exit codeKevin Krakauer1-1/+2
[ Upstream commit 784e6abd99f24024a8998b5916795f0bec9d2fd9 ] Modify gro.sh to return a useful exit code when the -t flag is used. It formerly returned 0 no matter what. Tested: Ran `gro.sh -t large` and verified that test failures return 1. Signed-off-by: Kevin Krakauer <krakauer@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250226192725.621969-2-krakauer@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-04tools/build: Don't pass test log files to linkerIan Rogers1-1/+5
[ Upstream commit 935e7cb5bb80106ff4f2fe39640f430134ef8cd8 ] Separate test log files from object files. Depend on test log output but don't pass to the linker. Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250311213628.569562-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-04objtool: Properly disable uaccess validationJosh Poimboeuf1-2/+9
[ Upstream commit e1a9dda74dbffbc3fa2069ff418a1876dc99fb14 ] If opts.uaccess isn't set, the uaccess validation is disabled, but only partially: it doesn't read the uaccess_safe_builtin list but still tries to do the validation. Disable it completely to prevent false warnings. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/0e95581c1d2107fb5f59418edf2b26bba38b0cbb.1742852846.git.jpoimboe@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-04selftests/bpf: Mitigate sockmap_ktls disconnect_after_delete failureIhor Solodrai1-1/+0
[ Upstream commit f2858f308131a09e33afb766cd70119b5b900569 ] "sockmap_ktls disconnect_after_delete" test has been failing on BPF CI after recent merges from netdev: * https://github.com/kernel-patches/bpf/actions/runs/14458537639 * https://github.com/kernel-patches/bpf/actions/runs/14457178732 It happens because disconnect has been disabled for TLS [1], and it renders the test case invalid. Removing all the test code creates a conflict between bpf and bpf-next, so for now only remove the offending assert [2]. The test will be removed later on bpf-next. [1] https://lore.kernel.org/netdev/20250404180334.3224206-1-kuba@kernel.org/ [2] https://lore.kernel.org/bpf/cfc371285323e1a3f3b006bfcf74e6cf7ad65258@linux.dev/ Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev> Link: https://lore.kernel.org/bpf/20250416170246.2438524-1-ihor.solodrai@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-22selftests/mm: compaction_test: support platform with huge mount of memoryFeng Tang1-5/+14
commit ab00ddd802f80e31fc9639c652d736fe3913feae upstream. When running mm selftest to verify mm patches, 'compaction_test' case failed on an x86 server with 1TB memory. And the root cause is that it has too much free memory than what the test supports. The test case tries to allocate 100000 huge pages, which is about 200 GB for that x86 server, and when it succeeds, it expects it's large than 1/3 of 80% of the free memory in system. This logic only works for platform with 750 GB ( 200 / (1/3) / 80% ) or less free memory, and may raise false alarm for others. Fix it by changing the fixed page number to self-adjustable number according to the real number of free memory. Link: https://lkml.kernel.org/r/20250423103645.2758-1-feng.tang@linux.alibaba.com Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory") Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com> Acked-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Tested-by: Baolin Wang <baolin.wang@inux.alibaba.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Sri Jayaramappa <sjayaram@akamai.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-22binfmt_elf: Honor PT_LOAD alignment for static PIEKees Cook1-1/+1
[ Upstream commit 3545deff0ec7a37de7ed9632e262598582b140e9 ] The p_align values in PT_LOAD were ignored for static PIE executables (i.e. ET_DYN without PT_INTERP). This is because there is no way to request a non-fixed mmap region with a specific alignment. ET_DYN with PT_INTERP uses a separate base address (ELF_ET_DYN_BASE) and binfmt_elf performs the ASLR itself, which means it can also apply alignment. For the mmap region, the address selection happens deep within the vm_mmap() implementation (when the requested address is 0). The earlier attempt to implement this: commit 9630f0d60fec ("fs/binfmt_elf: use PT_LOAD p_align values for static PIE") commit 925346c129da ("fs/binfmt_elf: fix PT_LOAD p_align values for loaders") did not take into account the different base address origins, and were eventually reverted: aeb7923733d1 ("revert "fs/binfmt_elf: use PT_LOAD p_align values for static PIE"") In order to get the correct alignment from an mmap base, binfmt_elf must perform a 0-address load first, then tear down the mapping and perform alignment on the resulting address. Since this is slightly more overhead, only do this when it is needed (i.e. the alignment is not the default ELF alignment). This does, however, have the benefit of being able to use MAP_FIXED_NOREPLACE, to avoid potential collisions. With this fixed, enable the static PIE self tests again. Reported-by: H.J. Lu <hjl.tools@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=215275 Link: https://lore.kernel.org/r/20240508173149.677910-3-keescook@chromium.org Signed-off-by: Kees Cook <kees@kernel.org> Stable-dep-of: 11854fe263eb ("binfmt_elf: Move brk for static PIE even if ASLR disabled") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-22selftests/exec: Build both static and non-static load_address testsKees Cook2-20/+66
[ Upstream commit b57a2907c9d96c56494ef25f8ec821cd0b355dd6 ] After commit 4d1cd3b2c5c1 ("tools/testing/selftests/exec: fix link error"), the load address alignment tests tried to build statically. This was silently ignored in some cases. However, after attempting to further fix the build by switching to "-static-pie", the test started failing. This appears to be due to non-PT_INTERP ET_DYN execs ("static PIE") not doing alignment correctly, which remains unfixed[1]. See commit aeb7923733d1 ("revert "fs/binfmt_elf: use PT_LOAD p_align values for static PIE"") for more details. Provide rules to build both static and non-static PIE binaries, improve debug reporting, and perform several test steps instead of a single all-or-nothing test. However, do not actually enable static-pie tests; alignment specification is only supported for ET_DYN with PT_INTERP ("regular PIE"). Link: https://bugzilla.kernel.org/show_bug.cgi?id=215275 [1] Link: https://lore.kernel.org/r/20240508173149.677910-1-keescook@chromium.org Signed-off-by: Kees Cook <kees@kernel.org> Stable-dep-of: 11854fe263eb ("binfmt_elf: Move brk for static PIE even if ASLR disabled") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-22selftests/exec: load_address: conform test to TAP format outputMuhammad Usama Anjum1-19/+15
[ Upstream commit c4095067736b7ed50316a2bc7c9577941e87ad45 ] Conform the layout, informational and status messages to TAP. No functional change is intended other than the layout of output messages. Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Link: https://lore.kernel.org/r/20240304155928.1818928-2-usama.anjum@collabora.com Signed-off-by: Kees Cook <keescook@chromium.org> Stable-dep-of: 11854fe263eb ("binfmt_elf: Move brk for static PIE even if ASLR disabled") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-02selftests/mincore: Allow read-ahead pages to reach the end of the fileQiuxu Zhuo1-3/+0
[ Upstream commit 197c1eaa7ba633a482ed7588eea6fd4aa57e08d4 ] When running the mincore_selftest on a system with an XFS file system, it failed the "check_file_mmap" test case due to the read-ahead pages reaching the end of the file. The failure log is as below: RUN global.check_file_mmap ... mincore_selftest.c:264:check_file_mmap:Expected i (1024) < vec_size (1024) mincore_selftest.c:265:check_file_mmap:Read-ahead pages reached the end of the file check_file_mmap: Test failed FAIL global.check_file_mmap This is because the read-ahead window size of the XFS file system on this machine is 4 MB, which is larger than the size from the #PF address to the end of the file. As a result, all the pages for this file are populated. blockdev --getra /dev/nvme0n1p5 8192 blockdev --getbsz /dev/nvme0n1p5 512 This issue can be fixed by extending the current FILE_SIZE 4MB to a larger number, but it will still fail if the read-ahead window size of the file system is larger enough. Additionally, in the real world, read-ahead pages reaching the end of the file can happen and is an expected behavior. Therefore, allowing read-ahead pages to reach the end of the file is a better choice for the "check_file_mmap" test case. Link: https://lore.kernel.org/r/20250311080940.21413-1-qiuxu.zhuo@intel.com Reported-by: Yi Lai <yi1.lai@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-02objtool: Stop UNRET validation on UD2Josh Poimboeuf1-0/+3
[ Upstream commit 9f9cc012c2cbac4833746a0182e06a8eec940d19 ] In preparation for simplifying INSN_SYSCALL, make validate_unret() terminate control flow on UD2 just like validate_branch() already does. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/ce841269e7e28c8b7f32064464a9821034d724ff.1744095216.git.jpoimboe@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-02selftests: ublk: fix test_stripe_04Ming Lei1-0/+24
[ Upstream commit 72070e57b0a518ec8e562a2b68fdfc796ef5c040 ] Commit 57ed58c13256 ("selftests: ublk: enable zero copy for stripe target") added test entry of test_stripe_04, but forgot to add the test script. So fix the test by adding the script file. Reported-by: Uday Shankar <ushankar@purestorage.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Uday Shankar <ushankar@purestorage.com> Link: https://lore.kernel.org/r/20250404001849.1443064-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-02objtool: Silence more KCOV warningsJosh Poimboeuf1-0/+6
[ Upstream commit 6b023c7842048c4bbeede802f3cf36b96c7a8b25 ] In the past there were issues with KCOV triggering unreachable instruction warnings, which is why unreachable warnings are now disabled with CONFIG_KCOV. Now some new KCOV warnings are showing up with GCC 14: vmlinux.o: warning: objtool: cpuset_write_resmask() falls through to next function cpuset_update_active_cpus.cold() drivers/usb/core/driver.o: error: objtool: usb_deregister() falls through to next function usb_match_device() sound/soc/codecs/snd-soc-wcd934x.o: warning: objtool: .text.wcd934x_slim_irq_handler: unexpected end of section All are caused by GCC KCOV not finishing an optimization, leaving behind a never-taken conditional branch to a basic block which falls through to the next function (or end of section). At a high level this is similar to the unreachable warnings mentioned above, in that KCOV isn't fully removing dead code. Treat it the same way by adding these to the list of warnings to ignore with CONFIG_KCOV. Reported-by: Ingo Molnar <mingo@kernel.org> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/66a61a0b65d74e072d3dc02384e395edb2adc3c5.1742852846.git.jpoimboe@kernel.org Closes: https://lore.kernel.org/Z9iTsI09AEBlxlHC@gmail.com Closes: https://lore.kernel.org/oe-kbuild-all/202503180044.oH9gyPeg-lkp@intel.com/ Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-02selftests/mm: generate a temporary mountpoint for cgroup filesystemMark Brown2-3/+3
[ Upstream commit 9c02223e2d9df5cb37c51aedb78f3960294e09b5 ] Currently if the filesystem for the cgroups version it wants to use is not mounted charge_reserved_hugetlb.sh and hugetlb_reparenting_test.sh tests will attempt to mount it on the hard coded path /dev/cgroup/memory, deleting that directory when the test finishes. This will fail if there is not a preexisting directory at that path, and since the directory is deleted subsequent runs of the test will fail. Instead of relying on this hard coded directory name use mktemp to generate a temporary directory to use as a mountpoint, fixing both the assumption and the disruption caused by deleting a preexisting directory. This means that if the relevant cgroup filesystem is not already mounted then we rely on having coreutils (which provides mktemp) installed. I suspect that many current users are relying on having things automounted by default, and given that the script relies on bash it's probably not an unreasonable requirement. Link: https://lkml.kernel.org/r/20250404-kselftest-mm-cgroup2-detection-v1-1-3dba6d32ba8c@kernel.org Fixes: 209376ed2a84 ("selftests/vm: make charge_reserved_hugetlb.sh work with existing cgroup setting") Signed-off-by: Mark Brown <broonie@kernel.org> Cc: Aishwarya TCV <aishwarya.tcv@arm.com> Cc: Mark Brown <broonie@kernel.org> Cc: Mina Almasry <almasrymina@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Waiman Long <longman@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-25landlock: Add the errata interfaceMickaël Salaün1-1/+45
commit 15383a0d63dbcd63dc7e8d9ec1bf3a0f7ebf64ac upstream. Some fixes may require user space to check if they are applied on the running kernel before using a specific feature. For instance, this applies when a restriction was previously too restrictive and is now getting relaxed (e.g. for compatibility reasons). However, non-visible changes for legitimate use (e.g. security fixes) do not require an erratum. Because fixes are backported down to a specific Landlock ABI, we need a way to avoid cherry-pick conflicts. The solution is to only update a file related to the lower ABI impacted by this issue. All the ABI files are then used to create a bitmask of fixes. The new errata interface is similar to the one used to get the supported Landlock ABI version, but it returns a bitmask instead because the order of fixes may not match the order of versions, and not all fixes may apply to all versions. The actual errata will come with dedicated commits. The description is not actually used in the code but serves as documentation. Create the landlock_abi_version symbol and use its value to check errata consistency. Update test_base's create_ruleset_checks_ordering tests and add errata tests. This commit is backportable down to the first version of Landlock. Fixes: 3532b0b4352c ("landlock: Enable user space to infer supported features") Cc: Günther Noack <gnoack@google.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250318161443.279194-3-mic@digikod.net Signed-off-by: Mickaël Salaün <mic@digikod.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-25test suite: use %zu to print size_tMatthew Wilcox (Oracle)1-2/+2
[ Upstream commit a30951d09c33c899f0e4aca80eb87fad5f10ecfa ] On 32-bit, we can't use %lu to print a size_t variable and gcc warns us about it. Shame it doesn't warn about it on 64-bit. Link: https://lkml.kernel.org/r/20250403003311.359917-1-Liam.Howlett@oracle.com Fixes: cc86e0c2f306 ("radix tree test suite: add support for slab bulk APIs") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-25selftests: mptcp: close fd_in before returning in main_loopGeliang Tang1-2/+5
commit c183165f87a486d5879f782c05a23c179c3794ab upstream. The file descriptor 'fd_in' is opened when cfg_input is configured, but not closed in main_loop(), this patch fixes it. Fixes: 05be5e273c84 ("selftests: mptcp: add disconnect tests") Cc: stable@vger.kernel.org Co-developed-by: Cong Liu <liucong2@kylinos.cn> Signed-off-by: Cong Liu <liucong2@kylinos.cn> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250328-net-mptcp-misc-fixes-6-15-v1-3-34161a482a7f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-25ktest: Fix Test Failures Due to Missing LOG_FILE DirectoriesAyush Jain1-0/+8
[ Upstream commit 5a1bed232781d356f842576daacc260f0d0c8d2e ] Handle missing parent directories for LOG_FILE path to prevent test failures. If the parent directories don't exist, create them to ensure the tests proceed successfully. Cc: <warthog9@eaglescrag.net> Link: https://lore.kernel.org/20250307043854.2518539-1-Ayush.jain3@amd.com Signed-off-by: Ayush Jain <Ayush.jain3@amd.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-25pm: cpupower: bench: Prevent NULL dereference on malloc failureZhongqiu Han1-0/+4
[ Upstream commit 208baa3ec9043a664d9acfb8174b332e6b17fb69 ] If malloc returns NULL due to low memory, 'config' pointer can be NULL. Add a check to prevent NULL dereference. Link: https://lore.kernel.org/r/20250219122715.3892223-1-quic_zhonhan@quicinc.com Signed-off-by: Zhongqiu Han <quic_zhonhan@quicinc.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-25selftests/futex: futex_waitv wouldblock test should failEdward Liaw1-1/+1
[ Upstream commit 7d50e00fef2832e98d7e06bbfc85c1d66ee110ca ] Testcase should fail if -EWOULDBLOCK is not returned when expected value differs from actual value from the waiter. Link: https://lore.kernel.org/r/20250404221225.1596324-1-edliaw@google.com Fixes: 9d57f7c79748920636f8293d2f01192d702fe390 ("selftests: futex: Test sys_futex_waitv() wouldblock") Signed-off-by: Edward Liaw <edliaw@google.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10perf tools: annotate asm_pure_loop.SMarcus Meissner1-0/+2
[ Upstream commit 9a352a90e88a041f4b26d359493e12a7f5ae1a6a ] Annotate so it is built with non-executable stack. Fixes: 8b97519711c3 ("perf test: Add asm pureloop test tool") Signed-off-by: Marcus Meissner <meissner@suse.de> Reviewed-by: Leo Yan <leo.yan@arm.com> Link: https://lore.kernel.org/r/20250323085410.23751-1-meissner@suse.de Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10perf python: Check if there is space to copy all the eventArnaldo Carvalho de Melo1-0/+5
[ Upstream commit 89aaeaf84231157288035b366cb6300c1c6cac64 ] The pyrf_event__new() method copies the event obtained from the perf ring buffer to a structure that will then be turned into a python object for further consumption, so it copies perf_event.header.size bytes to its 'event' member: $ pahole -C pyrf_event /tmp/build/perf-tools-next/python/perf.cpython-312-x86_64-linux-gnu.so struct pyrf_event { PyObject ob_base; /* 0 16 */ struct evsel * evsel; /* 16 8 */ struct perf_sample sample; /* 24 312 */ /* XXX last struct has 7 bytes of padding, 2 holes */ /* --- cacheline 5 boundary (320 bytes) was 16 bytes ago --- */ union perf_event event; /* 336 4168 */ /* size: 4504, cachelines: 71, members: 4 */ /* member types with holes: 1, total: 2 */ /* paddings: 1, sum paddings: 7 */ /* last cacheline: 24 bytes */ }; $ It was doing so without checking if the event just obtained has more than that space, fix it. This isn't a proper, final solution, as we need to support larger events, but for the time being we at least bounds check and document it. Fixes: 877108e42b1b9ba6 ("perf tools: Initial python binding") Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250312203141.285263-7-acme@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10perf python: Don't keep a raw_data pointer to consumed ring buffer spaceArnaldo Carvalho de Melo1-3/+1
[ Upstream commit f3fed3ae34d606819d87a63d970cc3092a5be7ab ] When processing tracepoints the perf python binding was parsing the event before calling perf_mmap__consume(&md->core) in pyrf_evlist__read_on_cpu(). But part of this event parsing was to set the perf_sample->raw_data pointer to the payload of the event, which then could be overwritten by other event before tracepoint fields were asked for via event.prev_comm in a python program, for instance. This also happened with other fields, but strings were were problems were surfacing, as there is UTF-8 validation for the potentially garbled data. This ended up showing up as (with some added debugging messages): ( field 'prev_comm' ret=0x7f7c31f65110, raw_size=68 ) ( field 'prev_pid' ret=0x7f7c23b1bed0, raw_size=68 ) ( field 'prev_prio' ret=0x7f7c239c0030, raw_size=68 ) ( field 'prev_state' ret=0x7f7c239c0250, raw_size=68 ) time 14771421785867 prev_comm= prev_pid=1919907691 prev_prio=796026219 prev_state=0x303a32313175 ==> ( XXX '��' len=16, raw_size=68) ( field 'next_comm' ret=(nil), raw_size=68 ) Traceback (most recent call last): File "/home/acme/git/perf-tools-next/tools/perf/python/tracepoint.py", line 51, in <module> main() File "/home/acme/git/perf-tools-next/tools/perf/python/tracepoint.py", line 46, in main event.next_comm, ^^^^^^^^^^^^^^^ AttributeError: 'perf.sample_event' object has no attribute 'next_comm' When event.next_comm was asked for, the PyUnicode_FromString() python API would fail and that tracepoint field wouldn't be available, stopping the tools/perf/python/tracepoint.py test tool. But, since we already do a copy of the whole event in pyrf_event__new, just use it and while at it remove what was done in in e8968e654191390a ("perf python: Fix pyrf_evlist__read_on_cpu event consuming") because we don't really need to wait for parsing the sample before declaring the event as consumed. This copy is questionable as is now, as it limits the maximum event + sample_type and tracepoint payload to sizeof(union perf_event), this all has been "working" because 'struct perf_event_mmap2', the largest entry in 'union perf_event' is: $ pahole -C perf_event ~/bin/perf | grep mmap2 struct perf_record_mmap2 mmap2; /* 0 4168 */ $ Fixes: bae57e3825a3dded ("perf python: Add support to resolve tracepoint fields") Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250312203141.285263-6-acme@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10perf python: Decrement the refcount of just created event on failureArnaldo Carvalho de Melo1-1/+5
[ Upstream commit 3de5a2bf5b4847f7a59a184568f969f8fe05d57f ] To avoid a leak if we have the python object but then something happens and we need to return the operation, decrement the offset of the newly created object. Fixes: 377f698db12150a1 ("perf python: Add struct evsel into struct pyrf_event") Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250312203141.285263-5-acme@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10perf python: Fixup description of sample.id event memberArnaldo Carvalho de Melo1-1/+1
[ Upstream commit 1376c195e8ad327bb9f2d32e0acc5ac39e7cb30a ] Some old cut'n'paste error, its "ip", so the description should be "event ip", not "event type". Fixes: 877108e42b1b9ba6 ("perf tools: Initial python binding") Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250312203141.285263-2-acme@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10perf units: Fix insufficient array spaceArnaldo Carvalho de Melo1-1/+1
[ Upstream commit cf67629f7f637fb988228abdb3aae46d0c1748fe ] No need to specify the array size, let the compiler figure that out. This addresses this compiler warning that was noticed while build testing on fedora rawhide: 31 15.81 fedora:rawhide : FAIL gcc version 15.0.1 20250225 (Red Hat 15.0.1-0) (GCC) util/units.c: In function 'unit_number__scnprintf': util/units.c:67:24: error: initializer-string for array of 'char' is too long [-Werror=unterminated-string-initialization] 67 | char unit[4] = "BKMG"; | ^~~~~~ cc1: all warnings being treated as errors Fixes: 9808143ba2e54818 ("perf tools: Add unit_number__scnprintf function") Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250310194534.265487-3-acme@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10perf evlist: Add success path to evlist__create_syswide_mapsIan Rogers1-7/+6
[ Upstream commit fe0ce8a9d85a48642880c9b78944cb0d23e779c5 ] Over various refactorings evlist__create_syswide_maps has been made to only ever return with -ENOMEM. Fix this so that when perf_evlist__set_maps is successfully called, 0 is returned. Reviewed-by: Howard Chu <howardchu95@gmail.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250228222308.626803-3-irogers@google.com Fixes: 8c0498b6891d7ca5 ("perf evlist: Fix create_syswide_maps() not propagating maps") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10selftests/bpf: Select NUMA_NO_NODE to create mapSaket Kumar Bhaskar1-0/+5
[ Upstream commit 4107a1aeb20ed4cdad6a0d49de92ea0f933c71b7 ] On powerpc, a CPU does not necessarily originate from NUMA node 0. This contrasts with architectures like x86, where CPU 0 is not hot-pluggable, making NUMA node 0 a consistently valid node. This discrepancy can lead to failures when creating a map on NUMA node 0, which is initialized by default, if no CPUs are allocated from NUMA node 0. This patch fixes the issue by setting NUMA_NO_NODE (-1) for map creation for this selftest. Fixes: 96eabe7a40aa ("bpf: Allow selecting numa node during map creation") Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/bpf/cf1f61468b47425ecf3728689bc9636ddd1d910e.1738302337.git.skb99@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10selftests/bpf: Fix string read in strncmp benchmarkViktor Malik1-1/+4
[ Upstream commit de07b182899227d5fd1ca7a1a7d495ecd453d49c ] The strncmp benchmark uses the bpf_strncmp helper and a hand-written loop to compare two strings. The values of the strings are filled from userspace. One of the strings is non-const (in .bss) while the other is const (in .rodata) since that is the requirement of bpf_strncmp. The problem is that in the hand-written loop, Clang optimizes the reads from the const string to always return 0 which breaks the benchmark. Use barrier_var to prevent the optimization. The effect can be seen on the strncmp-no-helper variant. Before this change: # ./bench strncmp-no-helper Setting up benchmark 'strncmp-no-helper'... Benchmark 'strncmp-no-helper' started. Iter 0 (112.309us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s Iter 1 (-23.238us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s Iter 2 ( 58.994us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s Iter 3 (-30.466us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s Iter 4 ( 29.996us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s Iter 5 ( 16.949us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s Iter 6 (-60.035us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s Summary: hits 0.000 ± 0.000M/s ( 0.000M/prod), drops 0.000 ± 0.000M/s, total operations 0.000 ± 0.000M/s After this change: # ./bench strncmp-no-helper Setting up benchmark 'strncmp-no-helper'... Benchmark 'strncmp-no-helper' started. Iter 0 ( 77.711us): hits 5.534M/s ( 5.534M/prod), drops 0.000M/s, total operations 5.534M/s Iter 1 ( 11.215us): hits 6.006M/s ( 6.006M/prod), drops 0.000M/s, total operations 6.006M/s Iter 2 (-14.253us): hits 5.931M/s ( 5.931M/prod), drops 0.000M/s, total operations 5.931M/s Iter 3 ( 59.087us): hits 6.005M/s ( 6.005M/prod), drops 0.000M/s, total operations 6.005M/s Iter 4 (-21.379us): hits 6.010M/s ( 6.010M/prod), drops 0.000M/s, total operations 6.010M/s Iter 5 (-20.310us): hits 5.861M/s ( 5.861M/prod), drops 0.000M/s, total operations 5.861M/s Iter 6 ( 53.937us): hits 6.004M/s ( 6.004M/prod), drops 0.000M/s, total operations 6.004M/s Summary: hits 5.969 ± 0.061M/s ( 5.969M/prod), drops 0.000 ± 0.000M/s, total operations 5.969 ± 0.061M/s Fixes: 9c42652f8be3 ("selftests/bpf: Add benchmark for bpf_strncmp() helper") Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Viktor Malik <vmalik@redhat.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/bpf/20250313122852.1365202-1-vmalik@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10libbpf: Fix hypothetical STT_SECTION extern NULL deref caseAndrii Nakryiko1-1/+1
[ Upstream commit e0525cd72b5979d8089fe524a071ea93fd011dc9 ] Fix theoretical NULL dereference in linker when resolving *extern* STT_SECTION symbol against not-yet-existing ELF section. Not sure if it's possible in practice for valid ELF object files (this would require embedded assembly manipulations, at which point BTF will be missing), but fix the s/dst_sym/dst_sec/ typo guarding this condition anyways. Fixes: faf6ed321cf6 ("libbpf: Add BPF static linker APIs") Fixes: a46349227cd8 ("libbpf: Add linker extern resolution support for functions and global variables") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20250220002821.834400-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-21selftests: rtnetlink: update netdevsim ipsec output formatHangbin Liu1-2/+2
commit 3ec920bb978ccdc68a7dfb304d303d598d038cb1 upstream. After the netdevsim update to use human-readable IP address formats for IPsec, we can now use the source and destination IPs directly in testing. Here is the result: # ./rtnetlink.sh -t kci_test_ipsec_offload PASS: ipsec_offload Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20241010040027.21440-4-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-21net: ipv4: Cache pmtu for all packet paths if multipath enabledVladimir Vdovin1-17/+95
[ Upstream commit 7d3f3b4367f315a61fc615e3138f3d320da8c466 ] Check number of paths by fib_info_num_path(), and update_or_create_fnhe() for every path. Problem is that pmtu is cached only for the oif that has received icmp message "need to frag", other oifs will still try to use "default" iface mtu. An example topology showing the problem: | host1 +---------+ | dummy0 | 10.179.20.18/32 mtu9000 +---------+ +-----------+----------------+ +---------+ +---------+ | ens17f0 | 10.179.2.141/31 | ens17f1 | 10.179.2.13/31 +---------+ +---------+ | (all here have mtu 9000) | +------+ +------+ | ro1 | 10.179.2.140/31 | ro2 | 10.179.2.12/31 +------+ +------+ | | ---------+------------+-------------------+------ | +-----+ | ro3 | 10.10.10.10 mtu1500 +-----+ | ======================================== some networks ======================================== | +-----+ | eth0| 10.10.30.30 mtu9000 +-----+ | host2 host1 have enabled multipath and sysctl net.ipv4.fib_multipath_hash_policy = 1: default proto static src 10.179.20.18 nexthop via 10.179.2.12 dev ens17f1 weight 1 nexthop via 10.179.2.140 dev ens17f0 weight 1 When host1 tries to do pmtud from 10.179.20.18/32 to host2, host1 receives at ens17f1 iface an icmp packet from ro3 that ro3 mtu=1500. And host1 caches it in nexthop exceptions cache. Problem is that it is cached only for the iface that has received icmp, and there is no way that ro3 will send icmp msg to host1 via another path. Host1 now have this routes to host2: ip r g 10.10.30.30 sport 30000 dport 443 10.10.30.30 via 10.179.2.12 dev ens17f1 src 10.179.20.18 uid 0 cache expires 521sec mtu 1500 ip r g 10.10.30.30 sport 30033 dport 443 10.10.30.30 via 10.179.2.140 dev ens17f0 src 10.179.20.18 uid 0 cache So when host1 tries again to reach host2 with mtu>1500, if packet flow is lucky enough to be hashed with oif=ens17f1 its ok, if oif=ens17f0 it blackholes and still gets icmp msgs from ro3 to ens17f1, until lucky day when ro3 will send it through another flow to ens17f0. Signed-off-by: Vladimir Vdovin <deliran@verdict.gg> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20241108093427.317942-1-deliran@verdict.gg Signed-off-by: Jakub Kicinski <kuba@kernel.org> Stable-dep-of: 139512191bd0 ("ipv4: use RCU protection in __ip_rt_update_pmtu()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-21selftests: gpio: gpio-sim: Fix missing chip disablementsKoichiro Den1-6/+25
[ Upstream commit f8524ac33cd452aef5384504b3264db6039a455e ] Since upstream commit 8bd76b3d3f3a ("gpio: sim: lock up configfs that an instantiated device depends on"), rmdir for an active virtual devices been prohibited. Update gpio-sim selftest to align with the change. Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202501221006.a1ca5dfa-lkp@intel.com Signed-off-by: Koichiro Den <koichiro.den@canonical.com> Link: https://lore.kernel.org/r/20250122043309.304621-1-koichiro.den@canonical.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>