summaryrefslogtreecommitdiff
path: root/tools
AgeCommit message (Collapse)AuthorFilesLines
2025-07-07selftests/bpf: Negative test case for tail call mapPaul Chaignon2-0/+33
This patch adds a negative test case for the following verifier error. expected prog array map for tail call Acked-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/aGu0i1X_jII-3aFa@mail.gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-07selftests/bpf: Add Spectre v4 testsLuis Gerhorst2-0/+153
Add the following tests: 1. A test with an (unimportant) ldimm64 (16 byte insn) and a Spectre-v4--induced nospec that clarifies and serves as a basic Spectre v4 test. 2. Make sure a Spectre v4 nospec_result does not prevent a Spectre v1 nospec from being added before the dangerous instruction (tests that [1] is fixed). 3. Combine the two, which is the combination that triggers the warning in [2]. This is because the unanalyzed stack write has nospec_result set, but the ldimm64 (which was just analyzed) had incremented insn_idx by 2. That violates the assertion that nospec_result is only used after insns that increment insn_idx by 1 (i.e., stack writes). [1] https://lore.kernel.org/bpf/4266fd5de04092aa4971cbef14f1b4b96961f432.camel@gmail.com/ [2] https://lore.kernel.org/bpf/685b3c1b.050a0220.2303ee.0010.GAE@google.com/ Signed-off-by: Luis Gerhorst <luis.gerhorst@fau.de> Link: https://lore.kernel.org/r/20250705190908.1756862-3-luis.gerhorst@fau.de Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-07selftests/bpf: Set CONFIG_PACKET=y for selftestsSaket Kumar Bhaskar1-0/+1
BPF selftest fails to build with below error: CLNG-BPF [test_progs] lsm_cgroup.bpf.o progs/lsm_cgroup.c:105:21: error: variable has incomplete type 'struct sockaddr_ll' 105 | struct sockaddr_ll sa = {}; | ^ progs/lsm_cgroup.c:105:9: note: forward declaration of 'struct sockaddr_ll' 105 | struct sockaddr_ll sa = {}; | ^ 1 error generated. lsm_cgroup selftest requires sockaddr_ll structure which is not there in vmlinux.h when the kernel is built with CONFIG_PACKET=m. Enabling CONFIG_PACKET=y ensures that sockaddr_ll is available in vmlinux, allowing it to be captured in the generated vmlinux.h for bpf selftests. Reported-by: Sachin P Bappalige <sachinpb@linux.ibm.com> Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20250707071735.705137-1-skb99@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-07selftests/bpf: tests for __arg_untrusted void * global func paramsEduard Zingerman1-0/+53
Check usage of __arg_untrusted parameters of primitive type: - passing of {trusted, untrusted, map value, scalar value, values with variable offset} to untrusted `void *`, `char *` or enum is ok; - varifier represents such parameters as rdonly_untrusted_mem(sz=0). Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250704230354.1323244-9-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-07selftests/bpf: test cases for __arg_untrustedEduard Zingerman1-0/+81
Check usage of __arg_untrusted parameters with PTR_TO_BTF_ID: - combining __arg_untrusted with other tags is forbidden; - non-kernel (program local) types for __arg_untrusted are forbidden; - passing of {trusted, untrusted, map value, scalar value, values with variable offset} to untrusted is ok; - passing of PTR_TO_BTF_ID with a different type to untrusted is ok; - passing of untrusted to trusted is forbidden. Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250704230354.1323244-7-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-07libbpf: __arg_untrusted in bpf_helpers.hEduard Zingerman1-0/+1
Make btf_decl_tag("arg:untrusted") available for libbpf users via macro. Makes the following usage possible: void foo(struct bar *p __arg_untrusted) { ... } void bar(struct foo *p __arg_trusted) { ... foo(p->buz->bar); // buz derefrence looses __trusted ... } Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250704230354.1323244-6-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-07selftests/bpf: ptr_to_btf_id struct walk ending with primitive pointerEduard Zingerman1-0/+31
Validate that reading a PTR_TO_BTF_ID field produces a value of type PTR_TO_MEM|MEM_RDONLY|PTR_UNTRUSTED, if field is a pointer to a primitive type. Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250704230354.1323244-4-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-07bpf: rdonly_untrusted_mem for btf id walk pointer leafsEduard Zingerman1-1/+1
When processing a load from a PTR_TO_BTF_ID, the verifier calculates the type of the loaded structure field based on the load offset. For example, given the following types: struct foo { struct foo *a; int *b; } *p; The verifier would calculate the type of `p->a` as a pointer to `struct foo`. However, the type of `p->b` is currently calculated as a SCALAR_VALUE. This commit updates the logic for processing PTR_TO_BTF_ID to instead calculate the type of p->b as PTR_TO_MEM|MEM_RDONLY|PTR_UNTRUSTED. This change allows further dereferencing of such pointers (using probe memory instructions). Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250704230354.1323244-3-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-07tools/nolibc: avoid false-positive -Wmaybe-uninitialized through waitpid()Thomas Weißschuh1-1/+1
The compiler does not know that waitid() will only ever return 0 or -1. If waitid() would return a positive value than waitpid() would return that same value and *status would not be initialized. However users calling waitpid() know that the only possible return values of it are 0 or -1. They therefore might check for errors with 'ret == -1' or 'ret < 0' and use *status otherwise. The compiler will then warn about the usage of a potentially uninitialized variable. Example: $ cat test.c #include <stdio.h> #include <unistd.h> int main(void) { int ret, status; ret = waitpid(0, &status, 0); if (ret == -1) return 0; printf("status %x\n", status); return 0; } $ gcc --version gcc (GCC) 15.1.1 20250425 $ gcc -Wall -Os -Werror -nostdlib -nostdinc -static -Iusr/include -Itools/include/nolibc/ -o /dev/null test.c test.c: In function ‘main’: test.c:12:9: error: ‘status’ may be used uninitialized [-Werror=maybe-uninitialized] 12 | printf("status %x\n", status); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ test.c:6:18: note: ‘status’ was declared here 6 | int ret, status; | ^~~~~~ cc1: all warnings being treated as errors Avoid the warning by normalizing waitid() errors to '-1' in waitpid(). Fixes: 0c89abf5ab3f ("tools/nolibc: implement waitpid() in terms of waitid()") Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Acked-by: Willy Tarreau <w@1wt.eu> Link: https://lore.kernel.org/r/20250707-nolibc-waitpid-uninitialized-v1-1-dcd4e70bcd8f@linutronix.de Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2025-07-06Merge tag 'objtool_urgent_for_v6.16_rc5' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool fix from Borislav Petkov: - Fix the compilation of an x86 kernel on a big engian machine due to a missed endianness conversion * tag 'objtool_urgent_for_v6.16_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Add missing endian conversion to read_annotate()
2025-07-06Merge tag 'locking_urgent_for_v6.16_rc5' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Borislav Petkov: - Disable FUTEX_PRIVATE_HASH for this cycle due to a performance regression - Add a selftests compilation product to the corresponding .gitignore file * tag 'locking_urgent_for_v6.16_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: selftests/futex: Add futex_numa to .gitignore futex: Temporary disable FUTEX_PRIVATE_HASH
2025-07-06selftests/futex: Convert 32-bit timespec to 64-bit version for 32-bit ↵Terry Tritton1-1/+7
compatibility mode sys_futex_wait() expects a struct __kernel_timespec pointer for the timeout, but the provided struct timespec pointer is of type struct old_timespec32 when compiled for 32-bit architectures, unless they use 64-bit timespecs already. Make it work for all variants by converting the provided timespec value into a local struct __kernel_timespec and provide a pointer to it to the syscall. This is a pointless operation for 64-bit, but this is not a hotpath operation, so keep it simple. This fix is based off [1] Originally-by: Wei Gao <wegao@suse.com> Signed-off-by: Terry Tritton <terry.tritton@linaro.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250704190234.14230-1-terry.tritton@linaro.org Link: https://lore.kernel.org/all/20231203235117.29677-1-wegao@suse.com/ [1]
2025-07-06selftests/nolibc: correctly report errors from printf() and friendsThomas Weißschuh2-2/+25
When an error is encountered by printf() it needs to be reported. errno() is already set by the callback. sprintf() is different, but that keeps working and is already tested. Also add a new test. Fixes: 7e4346f4a3a6 ("tools/nolibc/stdio: add a minimal [vf]printf() implementation") Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Acked-by: Willy Tarreau <w@1wt.eu> Link: https://lore.kernel.org/r/20250704-nolibc-printf-error-v1-2-74b7a092433b@linutronix.de Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2025-07-06selftests/nolibc: create /dev/full when running as PID 1Thomas Weißschuh1-1/+3
An upcoming testcase will use /dev/full. Make sure it is always present. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Acked-by: Willy Tarreau <w@1wt.eu> Link: https://lore.kernel.org/r/20250704-nolibc-printf-error-v1-1-74b7a092433b@linutronix.de Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2025-07-06tools/nolibc: add support for clock_nanosleep() and nanosleep()Thomas Weißschuh2-0/+35
Also add some tests. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Acked-by: Willy Tarreau <w@1wt.eu> Link: https://lore.kernel.org/r/20250704-nolibc-nanosleep-v1-1-d79c19701952@linutronix.de Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2025-07-06selftests/futex: Add futex_numa to .gitignoreTerry Tritton1-0/+1
futex_numa was never added to the .gitignore file. Add it. Fixes: 9140f57c1c13 ("futex,selftests: Add another FUTEX2_NUMA selftest") Signed-off-by: Terry Tritton <terry.tritton@linaro.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: André Almeida <andrealmeid@igalia.com> Link: https://lore.kernel.org/all/20250704103749.10341-1-terry.tritton@linaro.org
2025-07-04cxl/test: Simplify fw_buf_checksum_show()Eric Biggers1-19/+2
First, just use sha256() instead of a sequence of sha256_init(), sha256_update(), and sha256_final(). The result is the same. Second, use *phN instead of open-coding the conversion of bytes to hex. Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250630160645.3198-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-04Merge tag 'vfs-6.16-rc5.fixes' of ↵Linus Torvalds1-0/+5
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Fix a regression caused by the anonymous inode rework. Making them regular files causes various places in the kernel to tip over starting with io_uring. Revert to the former status quo and port our assertion to be based on checking the inode so we don't lose the valuable VFS_*_ON_*() assertions that have already helped discover weird behavior our outright bugs. - Fix the the upper bound calculation in fuse_fill_write_pages() - Fix priority inversion issues in the eventpoll code - Make secretmen use anon_inode_make_secure_inode() to avoid bypassing the LSM layer - Fix a netfs hang due to missing case in final DIO read result collection - Fix a double put of the netfs_io_request struct - Provide some helpers to abstract out NETFS_RREQ_IN_PROGRESS flag wrangling - Fix infinite looping in netfs_wait_for_pause/request() - Fix a netfs ref leak on an extra subrequest inserted into a request's list of subreqs - Fix various cifs RPC callbacks to set NETFS_SREQ_NEED_RETRY if a subrequest fails retriably - Fix a cifs warning in the workqueue code when reconnecting a channel - Fix the updating of i_size in netfs to avoid a race between testing if we should have extended the file with a DIO write and changing i_size - Merge the places in netfs that update i_size on write - Fix coredump socket selftests * tag 'vfs-6.16-rc5.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: anon_inode: rework assertions netfs: Update tracepoints in a number of ways netfs: Renumber the NETFS_RREQ_* flags to make traces easier to read netfs: Merge i_size update functions netfs: Fix i_size updating smb: client: set missing retry flag in cifs_writev_callback() smb: client: set missing retry flag in cifs_readv_callback() smb: client: set missing retry flag in smb2_writev_callback() netfs: Fix ref leak on inserted extra subreq in write retry netfs: Fix looping in wait functions netfs: Provide helpers to perform NETFS_RREQ_IN_PROGRESS flag wangling netfs: Fix double put of request netfs: Fix hang due to missing case in final DIO read result collection eventpoll: Fix priority inversion problem fuse: fix fuse_fill_write_pages() upper bound calculation fs: export anon_inode_make_secure_inode() and fix secretmem LSM bypass selftests/coredump: Fix "socket_detect_userspace_client" test failure
2025-07-04kselftest/arm64: Add a test for vfork() with GCSMark Brown1-0/+63
Ensure that we've got at least some coverage of the special cases around vfork() by adding a test case in basic-gcs doing the same thing as the plain fork() one - vfork(), do a few checks and then return to the parent. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250703-arm64-gcs-vfork-exit-v3-3-1e9a9d2ddbbe@kernel.org Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2025-07-04selftests/nolibc: Add coverage of vfork()Mark Brown1-4/+19
Generalise the existing fork() test to also cover the newly added vfork() implementation. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20250703-arm64-gcs-vfork-exit-v3-4-1e9a9d2ddbbe@kernel.org Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2025-07-04tools/nolibc: Provide vfork()Mark Brown2-0/+45
To allow testing of vfork() support in the arm64 basic-gcs test provide an implementation for nolibc, using the vfork() syscall if one is available and otherwise clone3(). We implement in terms of clone3() since the order of the arguments for clone() varies between architectures. As for fork() SPARC returns the parent PID rather than 0 in the child for vfork() so needs custom handling. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20250703-arm64-gcs-vfork-exit-v3-2-1e9a9d2ddbbe@kernel.org Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2025-07-04tools/nolibc: Replace ifdef with if defined() in sys.hMark Brown1-15/+15
Thomas has requested that if defined() be used in place of ifdef but currently ifdef is used consistently in sys.h. Update all the instances of ifdef to if defined(). Suggested-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20250703-arm64-gcs-vfork-exit-v3-1-1e9a9d2ddbbe@kernel.org Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2025-07-04tools/nolibc: add support for SuperHThomas Weißschuh4-1/+173
Add support for SuperH/"sh" to nolibc. Only sh4 is tested for now. The startup code is special: __nolibc_entrypoint_epilogue() calls __builtin_unreachable() which emits a call to abort(). To make this work a function prologue is generated to set up a GOT pointer which corrupts "sp". __builtin_unreachable() is necessary for __attribute__((noreturn)). Also depending on compiler flags (for example -fPIC) even more prologue is generated. Work around this by defining a nested function in asm. Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70216 Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Acked-by: Willy Tarreau <w@1wt.eu> Acked-by: Rob Landley <rob@landley.net> Acked-by: D. Jeff Dionne <jeff@coresemi.io> Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Link: https://lore.kernel.org/r/20250623-nolibc-sh-v2-3-0f5b4b303025@weissschuh.net
2025-07-04selftests: net: extend SCM_PIDFD test to cover stale pidfdsAlexander Mikhalitsyn1-44/+173
Extend SCM_PIDFD test scenarios to also cover dead task's pidfd retrieval and reading its exit info. Cc: linux-kselftest@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: netdev@vger.kernel.org Cc: Shuah Khan <shuah@kernel.org> Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Simon Horman <horms@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Kuniyuki Iwashima <kuniyu@google.com> Cc: Lennart Poettering <mzxreary@0pointer.de> Cc: Luca Boccassi <bluca@debian.org> Cc: David Rheinsberg <david@readahead.eu> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Link: https://lore.kernel.org/20250703222314.309967-8-aleksandr.mikhalitsyn@canonical.com Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni5-19/+46
Cross-merge networking fixes after downstream PR (net-6.16-rc5). No conflicts. No adjacent changes. Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-04selftests/bpf: Add tests for prog streamsKumar Kartikeya Dwivedi3-0/+253
Add selftests to stress test the various facets of the stream API, memory allocation pattern, and ensuring dumping support is tested and functional. Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20250703204818.925464-13-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-04bpftool: Add support for dumping streamsKumar Kartikeya Dwivedi3-2/+70
Add support for printing the BPF stream contents of a program in bpftool. The new bpftool prog tracelog command is extended to take stdout and stderr arguments, and then the prog specification. The bpf_prog_stream_read() API added in previous patch is simply reused to grab data and then it is dumped to the respective file. The stdout data is sent to stdout, and stderr is printed to stderr. Cc: Quentin Monnet <qmo@kernel.org> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20250703204818.925464-12-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-04libbpf: Introduce bpf_prog_stream_read() APIKumar Kartikeya Dwivedi3-0/+42
Introduce a libbpf API so that users can read data from a given BPF stream for a BPF prog fd. For now, only the low-level syscall wrapper is provided, we can add a bpf_program__* accessor as a follow up if needed. Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20250703204818.925464-11-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-04libbpf: Add bpf_stream_printk() macroKumar Kartikeya Dwivedi1-0/+16
Add a convenience macro to print data to the BPF streams. BPF_STDOUT and BPF_STDERR stream IDs in the vmlinux.h can be passed to the macro to print to the respective streams. Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20250703204818.925464-10-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-04bpf: Introduce BPF standard streamsKumar Kartikeya Dwivedi1-0/+24
Add support for a stream API to the kernel and expose related kfuncs to BPF programs. Two streams are exposed, BPF_STDOUT and BPF_STDERR. These can be used for printing messages that can be consumed from user space, thus it's similar in spirit to existing trace_pipe interface. The kernel will use the BPF_STDERR stream to notify the program of any errors encountered at runtime. BPF programs themselves may use both streams for writing debug messages. BPF library-like code may use BPF_STDERR to print warnings or errors on misuse at runtime. The implementation of a stream is as follows. Everytime a message is emitted from the kernel (directly, or through a BPF program), a record is allocated by bump allocating from per-cpu region backed by a page obtained using alloc_pages_nolock(). This ensures that we can allocate memory from any context. The eventual plan is to discard this scheme in favor of Alexei's kmalloc_nolock() [0]. This record is then locklessly inserted into a list (llist_add()) so that the printing side doesn't require holding any locks, and works in any context. Each stream has a maximum capacity of 4MB of text, and each printed message is accounted against this limit. Messages from a program are emitted using the bpf_stream_vprintk kfunc, which takes a stream_id argument in addition to working otherwise similar to bpf_trace_vprintk. The bprintf buffer helpers are extracted out to be reused for printing the string into them before copying it into the stream, so that we can (with the defined max limit) format a string and know its true length before performing allocations of the stream element. For consuming elements from a stream, we expose a bpf(2) syscall command named BPF_PROG_STREAM_READ_BY_FD, which allows reading data from the stream of a given prog_fd into a user space buffer. The main logic is implemented in bpf_stream_read(). The log messages are queued in bpf_stream::log by the bpf_stream_vprintk kfunc, and then pulled and ordered correctly in the stream backlog. For this purpose, we hold a lock around bpf_stream_backlog_peek(), as llist_del_first() (if we maintained a second lockless list for the backlog) wouldn't be safe from multiple threads anyway. Then, if we fail to find something in the backlog log, we splice out everything from the lockless log, and place it in the backlog log, and then return the head of the backlog. Once the full length of the element is consumed, we will pop it and free it. The lockless list bpf_stream::log is a LIFO stack. Elements obtained using a llist_del_all() operation are in LIFO order, thus would break the chronological ordering if printed directly. Hence, this batch of messages is first reversed. Then, it is stashed into a separate list in the stream, i.e. the backlog_log. The head of this list is the actual message that should always be returned to the caller. All of this is done in bpf_stream_backlog_fill(). From the kernel side, the writing into the stream will be a bit more involved than the typical printk. First, the kernel typically may print a collection of messages into the stream, and parallel writers into the stream may suffer from interleaving of messages. To ensure each group of messages is visible atomically, we can lift the advantage of using a lockless list for pushing in messages. To enable this, we add a bpf_stream_stage() macro, and require kernel users to use bpf_stream_printk statements for the passed expression to write into the stream. Underneath the macro, we have a message staging API, where a bpf_stream_stage object on the stack accumulates the messages being printed into a local llist_head, and then a commit operation splices the whole batch into the stream's lockless log list. This is especially pertinent for rqspinlock deadlock messages printed to program streams. After this change, we see each deadlock invocation as a non-interleaving contiguous message without any confusion on the reader's part, improving their user experience in debugging the fault. While programs cannot benefit from this staged stream writing API, they could just as well hold an rqspinlock around their print statements to serialize messages, hence this is kept kernel-internal for now. Overall, this infrastructure provides NMI-safe any context printing of messages to two dedicated streams. Later patches will add support for printing splats in case of BPF arena page faults, rqspinlock deadlocks, and cond_break timeouts, and integration of this facility into bpftool for dumping messages to user space. [0]: https://lore.kernel.org/bpf/20250501032718.65476-1-alexei.starovoitov@gmail.com Reviewed-by: Eduard Zingerman <eddyz87@gmail.com> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20250703204818.925464-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-04selftests/bpf: Add test cases for bpf_dynptr_memset()Ihor Solodrai2-0/+166
Add tests to verify the behavior of bpf_dynptr_memset(): * normal memset 0 * normal memset non-0 * memset with an offset * memset in dynptr that was adjusted * error: size overflow * error: offset+size overflow * error: readonly dynptr * memset into non-linear xdp dynptr Signed-off-by: Ihor Solodrai <isolodrai@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Mykyta Yatsenko <yatsenko@meta.com> Link: https://lore.kernel.org/bpf/20250702210309.3115903-3-isolodrai@meta.com
2025-07-03selftests/nolibc: use file driver for QEMU serialThomas Weißschuh1-2/+2
For the test implementation of the SuperH architecture a second serial serial port needs to be used. Unfortunately the currently used 'stdio' driver does not support multiple serial ports at the same time. Switch to the 'file' driver which does support multiple ports and is sufficient for the nolibc-test usecase. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Acked-by: Willy Tarreau <w@1wt.eu> Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Link: https://lore.kernel.org/r/20250623-nolibc-sh-v2-2-0f5b4b303025@weissschuh.net
2025-07-03selftests/nolibc: fix EXTRACONFIG variables orderingThomas Weißschuh1-2/+2
The variable block got disordered at some point. Use the correct ordering. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Acked-by: Willy Tarreau <w@1wt.eu> Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Link: https://lore.kernel.org/r/20250623-nolibc-sh-v2-1-0f5b4b303025@weissschuh.net
2025-07-03KVM: selftests: Change MDSCR_EL1 register holding variables as uint64_tAnshuman Khandual1-2/+2
Change MDSCR_EL1 register holding local variables as uint64_t that reflects its true register width as well. Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Joey Gouly <joey.gouly@arm.com> Cc: kvm@vger.kernel.org Cc: kvmarm@lists.linux.dev Cc: linux-kernel@vger.kernel.org Cc: linux-kselftest@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Acked-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Ada Couprie Diaz <ada.coupriediaz@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20250613023646.1215700-3-anshuman.khandual@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-07-03perf test: Add more test cases to sched testNamhyung Kim1-8/+31
$ sudo ./perf test -vv 92 92: perf sched tests: --- start --- test child forked, pid 1360101 Sched record pid 1360105's current affinity list: 0-3 pid 1360105's new affinity list: 0 pid 1360107's current affinity list: 0-3 pid 1360107's new affinity list: 0 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 4.330 MB /tmp/__perf_test_sched.perf.data.b3319 (12246 samples) ] Sched latency Sched script Sched map Sched timehist Samples of sched_switch event do not have callchains. ---- end(0) ---- 92: perf sched tests : Ok Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250703014942.1369397-9-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-03perf sched: Fix memory leaks in 'perf sched latency'Namhyung Kim1-3/+24
The work_atoms should be freed after use. Add free_work_atoms() to make sure to release all. It should use list_splice_init() when merging atoms to prevent accessing invalid pointers. Fixes: b1ffe8f3e0c96f552 ("perf sched: Finish latency => atom rename and misc cleanups") Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250703014942.1369397-8-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-03perf sched: Use RC_CHK_EQUAL() to compare pointersNamhyung Kim1-1/+1
So that it can check two pointers to the same object properly when REFCNT_CHECKING is on. Fixes: 78c32f4cb12f9430 ("libperf rc_check: Add RC_CHK_EQUAL") Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250703014942.1369397-7-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-03perf sched: Fix memory leaks for evsel->priv in timehistNamhyung Kim3-0/+25
It uses evsel->priv to save per-cpu timing information. It should be freed when the evsel is released. Add the priv destructor for evsel same as thread to handle that. Fixes: 49394a2a24c78ce0 ("perf sched timehist: Introduce timehist command") Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250703014942.1369397-6-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-03perf sched: Fix thread leaks in 'perf sched timehist'Namhyung Kim1-11/+37
Add missing thread__put() after machine__findnew_thread() or timehist_get_thread(). Also idle threads' last_thread should be refcounted properly. Fixes: 699b5b920db04a6f ("perf sched timehist: Save callchain when entering idle") Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250703014942.1369397-5-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-03perf sched: Fix memory leaks in 'perf sched map'Namhyung Kim1-11/+20
It maintains per-cpu pointers for the current thread but it doesn't release the refcounts. Fixes: 5e895278697c014e ("perf sched: Move curr_thread initialization to perf_sched__map()") Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250703014942.1369397-4-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-03perf sched: Free thread->priv using priv_destructorNamhyung Kim1-0/+2
In many perf sched subcommand saves priv data structure in the thread but it forgot to free them. As it's an opaque type with 'void *', it needs to register that knows how to free the data. In this case, just regular 'free()' is fine. Fixes: 04cb4fc4d40a5bf1 ("perf thread: Allow tools to register a thread->priv destructor") Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250703014942.1369397-3-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-03perf sched: Make sure it frees the usage stringNamhyung Kim1-12/+13
The parse_options_subcommand() allocates the usage string based on the given subcommands. So it should reach the end of the function to free the string to prevent memory leaks. Fixes: 1a5efc9e13f357ab ("libsubcmd: Don't free the usage string") Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250703014942.1369397-2-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-03perf tests make: Add NO_LIBDW=1 to minimal and add standalone testIan Rogers1-1/+3
Missing testing coverage of NO_LIBDW=1 and add NO_LIBDW=1 to the minimal test configuration. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250703053622.3141424-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-03perf header: Fix pipe mode header dumpingIan Rogers2-2/+15
The pipe mode header dumping was accidentally removed when tracing of header feature events in pipe mode was added. Minor spelling tweak to header test failure message. Fixes: 61051f9a8452 ("perf header: In pipe mode dump features without --header/-I") Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20250703042000.2740640-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-03selftests/sched_ext: Fix exit selftest hang on UPAndrea Righi1-0/+8
On single-CPU systems, ops.select_cpu() is never called, causing the EXIT_SELECT_CPU test case to wait indefinitely. Avoid the stall by skipping this specific sub-test when only one CPU is available. Reported-by: Phil Auld <pauld@redhat.com> Fixes: a5db7817af780 ("sched_ext: Add selftests") Signed-off-by: Andrea Righi <arighi@nvidia.com> Reviewed-by: Phil Auld <pauld@redhat.com> Tested-by: Phil Auld <pauld@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-07-03kselftest/arm64: Specify SVE data when testing VL set in sve-ptraceMark Brown1-1/+2
Since f916dd32a943 ("arm64/fpsimd: ptrace: Mandate SVE payload for streaming-mode state") we reject attempts to write to the streaming mode regset even if there is no register data supplied, causing the tests for setting vector lengths and setting SVE_VL_INHERIT in sve-ptrace to spuriously fail. Set the flag to avoid the issue, we still support not supplying register data. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20250609-kselftest-arm64-ssve-fixups-v2-3-998fcfa6f240@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-07-03kselftest/arm64: Fix test for streaming FPSIMD write in sve-ptraceMark Brown1-1/+3
Since f916dd32a943 ("arm64/fpsimd: ptrace: Mandate SVE payload for streaming-mode state") we do not support writing FPSIMD payload data when writing NT_ARM_SSVE but the sve-ptrace test has an explicit test for this being supported which was not updated to reflect the new behaviour. Fix the test to expect a failure when writing FPSIMD data to the streaming mode register set. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20250609-kselftest-arm64-ssve-fixups-v2-2-998fcfa6f240@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-07-03kselftest/arm64: Fix check for setting new VLs in sve-ptraceMark Brown1-1/+1
The check that the new vector length we set was the expected one was typoed to an assignment statement which for some reason the compilers didn't spot, most likely due to the macros involved. Fixes: a1d7111257cd ("selftests: arm64: More comprehensively test the SVE ptrace interface") Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Dev Jain <dev.jain@arm.com> Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20250609-kselftest-arm64-ssve-fixups-v2-1-998fcfa6f240@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-07-03kselftest/arm64: Convert tpidr2 test to use kselftest.hMark Brown2-104/+38
Recent work by Thomas Weißschuh means that it is now possible to use kselftest.h with nolibc. Convert the tpidr2 test which is nolibc specific to use kselftest.h, making it look more standard and ensuring it gets the benefit of any work done on kselftest.h. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20250609-kselftest-arm64-nolibc-header-v1-1-16ee1c6fbfed@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-07-03perf test: In forked mode add check that fds aren't leakedIan Rogers1-0/+69
When a test is forked no file descriptors should be open, however, parent ones may have been inherited - in particular those of the pipes of other forked child test processes. Add a loop to clean-up/close those file descriptors prior to running the test. At the end of the test assert that no additional file descriptors are present as this would indicate a file descriptor leak. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250624190326.2038704-6-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>