summaryrefslogtreecommitdiff
path: root/tools/testing/selftests/bpf
AgeCommit message (Collapse)AuthorFilesLines
2026-04-02bpf: Fix u32/s32 bounds when ranges cross min/max boundaryEduard Zingerman1-4/+58
[ Upstream commit fbc7aef517d8765e4c425d2792409bb9bf2e1f13 ] Same as in __reg64_deduce_bounds(), refine s32/u32 ranges in __reg32_deduce_bounds() in the following situations: - s32 range crosses U32_MAX/0 boundary, positive part of the s32 range overlaps with u32 range: 0 U32_MAX | [xxxxxxxxxxxxxx u32 range xxxxxxxxxxxxxx] | |----------------------------|----------------------------| |xxxxx s32 range xxxxxxxxx] [xxxxxxx| 0 S32_MAX S32_MIN -1 - s32 range crosses U32_MAX/0 boundary, negative part of the s32 range overlaps with u32 range: 0 U32_MAX | [xxxxxxxxxxxxxx u32 range xxxxxxxxxxxxxx] | |----------------------------|----------------------------| |xxxxxxxxx] [xxxxxxxxxxxx s32 range | 0 S32_MAX S32_MIN -1 - No refinement if ranges overlap in two intervals. This helps for e.g. consider the following program: call %[bpf_get_prandom_u32]; w0 &= 0xffffffff; if w0 < 0x3 goto 1f; // on fall-through u32 range [3..U32_MAX] if w0 s> 0x1 goto 1f; // on fall-through s32 range [S32_MIN..1] if w0 s< 0x0 goto 1f; // range can be narrowed to [S32_MIN..-1] r10 = 0; 1: ...; The reg_bounds.c selftest is updated to incorporate identical logic, refinement based on non-overflowing range halves: ((x ∩ [0, smax]) ∩ (y ∩ [0, smax])) ∪ ((x ∩ [smin,-1]) ∩ (y ∩ [smin,-1])) Reported-by: Andrea Righi <arighi@nvidia.com> Reported-by: Emil Tsalapatis <emil@etsalapatis.com> Closes: https://lore.kernel.org/bpf/aakqucg4vcujVwif@gpd4/T/ Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20260306-bpf-32-bit-range-overflow-v3-1-f7f67e060a6b@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-04-02bpf: Fix exception exit lock checking for subprogsIhor Solodrai1-3/+6
[ Upstream commit 6c2128505f61b504c79a20b89596feba61388112 ] process_bpf_exit_full() passes check_lock = !curframe to check_resource_leak(), which is false in cases when bpf_throw() is called from a static subprog. This makes check_resource_leak() to skip validation of active_rcu_locks, active_preempt_locks, and active_irq_id on exception exits from subprogs. At runtime bpf_throw() unwinds the stack via ORC without releasing any user-acquired locks, which may cause various issues as the result. Fix by setting check_lock = true for exception exits regardless of curframe, since exceptions bypass all intermediate frame cleanup. Update the error message prefix to "bpf_throw" for exception exits to distinguish them from normal BPF_EXIT. Fix reject_subprog_with_rcu_read_lock test which was previously passing for the wrong reason. Test program returned directly from the subprog call without closing the RCU section, so the error was triggered by the unclosed RCU lock on normal exit, not by bpf_throw. Update __msg annotations for affected tests to match the new "bpf_throw" error prefix. The spin_lock case is not affected because they are already checked [1] at the call site in do_check_insn() before bpf_throw can run. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/bpf/verifier.c?h=v7.0-rc4#n21098 Assisted-by: Claude:claude-opus-4-6 Fixes: f18b03fabaa9 ("bpf: Implement BPF exceptions") Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20260320000809.643798-1-ihor.solodrai@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-12selftests/bpf: Avoid simplification of crafted bounds testPaul Chaignon1-1/+1
[ Upstream commit 024cea2d647ed8ab942f19544b892d324dba42b4 ] The reg_bounds_crafted tests validate the verifier's range analysis logic. They focus on the actual ranges and thus ignore the tnum. As a consequence, they carry the assumption that the tested cases can be reproduced in userspace without using the tnum information. Unfortunately, the previous change the refinement logic breaks that assumption for one test case: (u64)2147483648 (u32)<op> [4294967294; 0x100000000] The tested bytecode is shown below. Without our previous improvement, on the false branch of the condition, R7 is only known to have u64 range [0xfffffffe; 0x100000000]. With our improvement, and using the tnum information, we can deduce that R7 equals 0x100000000. 19: (bc) w0 = w6 ; R6=0x80000000 20: (bc) w0 = w7 ; R7=scalar(smin=umin=0xfffffffe,smax=umax=0x100000000,smin32=-2,smax32=0,var_off=(0x0; 0x1ffffffff)) 21: (be) if w6 <= w7 goto pc+3 ; R6=0x80000000 R7=0x100000000 R7's tnum is (0; 0x1ffffffff). On the false branch, regs_refine_cond_op refines R7's u32 range to [0; 0x7fffffff]. Then, __reg32_deduce_bounds refines the s32 range to 0 using u32 and finally also sets u32=0. From this, __reg_bound_offset improves the tnum to (0; 0x100000000). Finally, our previous patch uses this new tnum to deduce that it only intersect with u64=[0xfffffffe; 0x100000000] in a single value: 0x100000000. Because the verifier uses the tnum to reach this constant value, the selftest is unable to reproduce it by only simulating ranges. The solution implemented in this patch is to change the test case such that there is more than one overlap value between u64 and the tnum. The max. u64 value is thus changed from 0x100000000 to 0x300000000. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Link: https://lore.kernel.org/r/50641c6a7ef39520595dcafa605692427c1006ec.1772225741.git.paul.chaignon@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-12bpf: collect only live registers in linked regsEduard Zingerman3-35/+63
[ Upstream commit 2658a1720a1944fbaeda937000ad2b3c3dfaf1bb ] Fix an inconsistency between func_states_equal() and collect_linked_regs(): - regsafe() uses check_ids() to verify that cached and current states have identical register id mapping. - func_states_equal() calls regsafe() only for registers computed as live by compute_live_registers(). - clean_live_states() is supposed to remove dead registers from cached states, but it can skip states belonging to an iterator-based loop. - collect_linked_regs() collects all registers sharing the same id, ignoring the marks computed by compute_live_registers(). Linked registers are stored in the state's jump history. - backtrack_insn() marks all linked registers for an instruction as precise whenever one of the linked registers is precise. The above might lead to a scenario: - There is an instruction I with register rY known to be dead at I. - Instruction I is reached via two paths: first A, then B. - On path A: - There is an id link between registers rX and rY. - Checkpoint C is created at I. - Linked register set {rX, rY} is saved to the jump history. - rX is marked as precise at I, causing both rX and rY to be marked precise at C. - On path B: - There is no id link between registers rX and rY, otherwise register states are sub-states of those in C. - Because rY is dead at I, check_ids() returns true. - Current state is considered equal to checkpoint C, propagate_precision() propagates spurious precision mark for register rY along the path B. - Depending on a program, this might hit verifier_bug() in the backtrack_insn(), e.g. if rY ∈ [r1..r5] and backtrack_insn() spots a function call. The reproducer program is in the next patch. This was hit by sched_ext scx_lavd scheduler code. Changes in tests: - verifier_scalar_ids.c selftests need modification to preserve some registers as live for __msg() checks. - exceptions_assert.c adjusted to match changes in the verifier log, R0 is dead after conditional instruction and thus does not get range. - precise.c adjusted to match changes in the verifier log, register r9 is dead after comparison and it's range is not important for test. Reported-by: Emil Tsalapatis <emil@etsalapatis.com> Fixes: 0fb3cf6110a5 ("bpf: use register liveness information for func_states_equal") Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20260306-linked-regs-and-propagate-precision-v1-1-18e859be570d@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-12selftests/bpf: Fix OOB read in dmabuf_collectorT.J. Mercier1-1/+1
[ Upstream commit 6881af27f9ea0f5ca8f606f573ef5cc25ca31fe4 ] Dmabuf name allocations can be less than DMA_BUF_NAME_LEN characters, but bpf_probe_read_kernel always tries to read exactly that many bytes. If a name is less than DMA_BUF_NAME_LEN characters, bpf_probe_read_kernel will read past the end. bpf_probe_read_kernel_str stops at the first NUL terminator so use it instead, like iter_dmabuf_for_each already does. Fixes: ae5d2c59ecd7 ("selftests/bpf: Add test for dmabuf_iter") Reported-by: Jerome Lee <jaewookl@quicinc.com> Signed-off-by: T.J. Mercier <tjmercier@google.com> Link: https://lore.kernel.org/r/20260225003349.113746-1-tjmercier@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-27selftests/xsk: fix number of Tx frags in invalid packetLarysa Zaremba1-1/+1
[ Upstream commit 88af9fefed412e4bea9a1a771cbe6fe347fa3507 ] The issue occurs in TOO_MANY_FRAGS test case when xdp_zc_max_segs is set to an odd number. TOO_MANY_FRAGS test case contains an invalid packet consisting of (xdp_zc_max_segs) frags. Every frag, even the last one has XDP_PKT_CONTD flag set. This packet is expected to be dropped. After that, there is a valid linear packet, which is expected to be received back. Once (xdp_zc_max_segs) is an odd number, the last packet cannot be received, if packet forwarding between Rx and Tx interfaces relies on the ethernet header, e.g. checks for ETH_P_LOOPBACK. Packet is malformed, if all traffic is looped. Turns out, sending function processes multiple invalid frags as if they were in 2-frag packets. So once the invalid mbuf packet contains an odd number of those, the valid packet after gets paired with the previous invalid descriptor, and hence does not get an ethernet header generated, so it is either dropped or malformed. Make invalid packets in verbatim mode always have only a single frag. For such packets, number of frags is otherwise meaningless, as descriptor flags are pre-configured in verbatim mode and packet data is not generated for invalid descriptors. Fixes: 697604492b64 ("selftests/xsk: add invalid descriptor test for multi-buffer") Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Link: https://lore.kernel.org/r/20260203155103.2305816-3-larysa.zaremba@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-27selftests/xsk: properly handle batch ending in the middle of a packetLarysa Zaremba1-0/+2
[ Upstream commit 42e41b2a0afa04ca49ee2725aadf90ccb058ed28 ] Referenced commit reduced the scope of the variable pkt, so now it has to be reinitialized via pkt_stream_get_next_rx_pkt(), which also increments some counters. When the packet is interrupted by the batch ending, pkt stream therefore proceeds to the next packet, while xsk ring still contains the previous one, this results in a pkt_nb mismatch. Decrement the affected counters when packet is interrupted. Fixes: 8913e653e9b8 ("selftests/xsk: Iterate over all the sockets in the receive pkts function") Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Link: https://lore.kernel.org/r/20260203155103.2305816-2-larysa.zaremba@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-27selftests/bpf: Fix kprobe multi stacktrace_ips testJiri Olsa1-5/+14
[ Upstream commit 0207f94971e72a13380e28022c86da147e8e090f ] We now include the attached function in the stack trace, fixing the test accordingly. Fixes: c9e208fa93cd ("selftests/bpf: Add stacktrace ips test for kprobe_multi/kretprobe_multi") Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20260126211837.472802-4-jolsa@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-27selftests/bpf: Fix resource leak in serial_test_wq on attach failureKery Qi1-2/+3
[ Upstream commit a32ae2658471dd87a2f7a438388ed7d9a5767212 ] When wq__attach() fails, serial_test_wq() returns early without calling wq__destroy(), leaking the skeleton resources allocated by wq__open_and_load(). This causes ASAN leak reports in selftests runs. Fix this by jumping to a common clean_up label that calls wq__destroy() on all exit paths after successful open_and_load. Note that the early return after wq__open_and_load() failure is correct and doesn't need fixing, since that function returns NULL on failure (after internally cleaning up any partial allocations). Fixes: 8290dba51910 ("selftests/bpf: wq: add bpf_wq_start() checks") Signed-off-by: Kery Qi <qikeyu2017@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/bpf/20260121094114.1801-3-qikeyu2017@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-27selftests/bpf: veristat: fix printing order in output_stats()Puranjay Mohan1-1/+1
[ Upstream commit c286e7e9d1f1f3d90ad11c37e896f582b02d19c4 ] The order of the variables in the printf() doesn't match the text and therefore veristat prints something like this: Done. Processed 24 files, 0 programs. Skipped 62 files, 0 programs. When it should print: Done. Processed 24 files, 62 programs. Skipped 0 files, 0 programs. Fix the order of variables in the printf() call. Fixes: 518fee8bfaf2 ("selftests/bpf: make veristat skip non-BPF and failing-to-open BPF objects") Tested-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Link: https://lore.kernel.org/r/20251231221052.759396-1-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-06selftests/bpf: Update xdp_context_test_run test to check maximum metadata sizeToke Høiland-Jørgensen1-3/+11
Update the selftest to check that the metadata size check takes the xdp_frame size into account in bpf_prog_test_run. The original check (for meta size 256) was broken because the data frame supplied was smaller than this, triggering a different EINVAL return. So supply a larger data frame for this test to make sure we actually exercise the check we think we are. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Reviewed-by: Amery Hung <ameryhung@gmail.com> Link: https://lore.kernel.org/r/20260105114747.1358750-2-toke@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-17Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfLinus Torvalds4-22/+139
Pull bpf fixes from Alexei Starovoitov: - Fix BPF builds due to -fms-extensions. selftests (Alexei Starovoitov), bpftool (Quentin Monnet). - Fix build of net/smc when CONFIG_BPF_SYSCALL=y, but CONFIG_BPF_JIT=n (Geert Uytterhoeven) - Fix livepatch/BPF interaction and support reliable unwinding through BPF stack frames (Josh Poimboeuf) - Do not audit capability check in arm64 JIT (Ondrej Mosnacek) - Fix truncated dmabuf BPF iterator reads (T.J. Mercier) - Fix verifier assumptions of bpf_d_path's output buffer (Shuran Liu) - Fix warnings in libbpf when built with -Wdiscarded-qualifiers under C23 (Mikhail Gavrilov) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: add regression test for bpf_d_path() bpf: Fix verifier assumptions of bpf_d_path's output buffer selftests/bpf: Add test for truncated dmabuf_iter reads bpf: Fix truncated dmabuf iterator reads x86/unwind/orc: Support reliable unwinding through BPF stack frames bpf: Add bpf_has_frame_pointer() bpf, arm64: Do not audit capability check in do_jit() libbpf: Fix -Wdiscarded-qualifiers under C23 bpftool: Fix build warnings due to MS extensions net: smc: SMC_HS_CTRL_BPF should depend on BPF_JIT selftests/bpf: Add -fms-extensions to bpf build flags
2025-12-10selftests/bpf: add regression test for bpf_d_path()Shuran Liu2-17/+95
Add a regression test for bpf_d_path() to cover incorrect verifier assumptions caused by an incorrect function prototype. The test attaches to the fallocate hook, calls bpf_d_path() and verifies that a simple prefix comparison on the returned pathname behaves correctly after the fix in patch 1. It ensures the verifier does not assume the buffer remains unwritten. Co-developed-by: Zesen Liu <ftyg@live.com> Signed-off-by: Zesen Liu <ftyg@live.com> Co-developed-by: Peili Gao <gplhust955@gmail.com> Signed-off-by: Peili Gao <gplhust955@gmail.com> Co-developed-by: Haoran Ni <haoran.ni.cs@gmail.com> Signed-off-by: Haoran Ni <haoran.ni.cs@gmail.com> Signed-off-by: Shuran Liu <electronlsr@gmail.com> Link: https://lore.kernel.org/r/20251206141210.3148-3-electronlsr@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-10selftests/bpf: Add test for truncated dmabuf_iter readsT.J. Mercier1-5/+42
If many dmabufs are present, reads of the dmabuf iterator can be truncated at PAGE_SIZE or user buffer size boundaries before the fix in "bpf: Fix truncated dmabuf iterator reads". Add a test to confirm truncation does not occur. Signed-off-by: T.J. Mercier <tjmercier@google.com> Link: https://lore.kernel.org/r/20251204000348.1413593-2-tjmercier@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-09selftests/bpf: replace "__auto_type" with "auto"H. Peter Anvin1-2/+7
Replace instances of "__auto_type" with "auto" in: tools/testing/selftests/bpf/prog_tests/socket_helpers.h This file does not seem to be including <linux/compiler_types.h> directly or indirectly, so copy the definition but guard it with !defined(auto). Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
2025-12-07Merge tag 'mm-nonmm-stable-2025-12-06-11-14' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - "panic: sys_info: Refactor and fix a potential issue" (Andy Shevchenko) fixes a build issue and does some cleanup in ib/sys_info.c - "Implement mul_u64_u64_div_u64_roundup()" (David Laight) enhances the 64-bit math code on behalf of a PWM driver and beefs up the test module for these library functions - "scripts/gdb/symbols: make BPF debug info available to GDB" (Ilya Leoshkevich) makes BPF symbol names, sizes, and line numbers available to the GDB debugger - "Enable hung_task and lockup cases to dump system info on demand" (Feng Tang) adds a sysctl which can be used to cause additional info dumping when the hung-task and lockup detectors fire - "lib/base64: add generic encoder/decoder, migrate users" (Kuan-Wei Chiu) adds a general base64 encoder/decoder to lib/ and migrates several users away from their private implementations - "rbree: inline rb_first() and rb_last()" (Eric Dumazet) makes TCP a little faster - "liveupdate: Rework KHO for in-kernel users" (Pasha Tatashin) reworks the KEXEC Handover interfaces in preparation for Live Update Orchestrator (LUO), and possibly for other future clients - "kho: simplify state machine and enable dynamic updates" (Pasha Tatashin) increases the flexibility of KEXEC Handover. Also preparation for LUO - "Live Update Orchestrator" (Pasha Tatashin) is a major new feature targeted at cloud environments. Quoting the cover letter: This series introduces the Live Update Orchestrator, a kernel subsystem designed to facilitate live kernel updates using a kexec-based reboot. This capability is critical for cloud environments, allowing hypervisors to be updated with minimal downtime for running virtual machines. LUO achieves this by preserving the state of selected resources, such as memory, devices and their dependencies, across the kernel transition. As a key feature, this series includes support for preserving memfd file descriptors, which allows critical in-memory data, such as guest RAM or any other large memory region, to be maintained in RAM across the kexec reboot. Mike Rappaport merits a mention here, for his extensive review and testing work. - "kexec: reorganize kexec and kdump sysfs" (Sourabh Jain) moves the kexec and kdump sysfs entries from /sys/kernel/ to /sys/kernel/kexec/ and adds back-compatibility symlinks which can hopefully be removed one day - "kho: fixes for vmalloc restoration" (Mike Rapoport) fixes a BUG which was being hit during KHO restoration of vmalloc() regions * tag 'mm-nonmm-stable-2025-12-06-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (139 commits) calibrate: update header inclusion Reinstate "resource: avoid unnecessary lookups in find_next_iomem_res()" vmcoreinfo: track and log recoverable hardware errors kho: fix restoring of contiguous ranges of order-0 pages kho: kho_restore_vmalloc: fix initialization of pages array MAINTAINERS: TPM DEVICE DRIVER: update the W-tag init: replace simple_strtoul with kstrtoul to improve lpj_setup KHO: fix boot failure due to kmemleak access to non-PRESENT pages Documentation/ABI: new kexec and kdump sysfs interface Documentation/ABI: mark old kexec sysfs deprecated kexec: move sysfs entries to /sys/kernel/kexec test_kho: always print restore status kho: free chunks using free_page() instead of kfree() selftests/liveupdate: add kexec test for multiple and empty sessions selftests/liveupdate: add simple kexec-based selftest for LUO selftests/liveupdate: add userspace API selftests docs: add documentation for memfd preservation via LUO mm: memfd_luo: allow preserving memfd liveupdate: luo_file: add private argument to store runtime state mm: shmem: export some functions to internal.h ...
2025-12-04selftests/bpf: Add -fms-extensions to bpf build flagsAlexei Starovoitov1-0/+2
The kernel is now built with -fms-extensions, therefore generated vmlinux.h contains types like: struct slab { .. struct freelist_counters; }; Use -fms-extensions and -Wno-microsoft-anon-tag flags to build bpf programs that #include "vmlinux.h" Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-04Merge tag 'net-next-6.19' of ↵Linus Torvalds10-168/+1275
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core & protocols: - Replace busylock at the Tx queuing layer with a lockless list. Resulting in a 300% (4x) improvement on heavy TX workloads, sending twice the number of packets per second, for half the cpu cycles. - Allow constantly busy flows to migrate to a more suitable CPU/NIC queue. Normally we perform queue re-selection when flow comes out of idle, but under extreme circumstances the flows may be constantly busy. Add sysctl to allow periodic rehashing even if it'd risk packet reordering. - Optimize the NAPI skb cache, make it larger, use it in more paths. - Attempt returning Tx skbs to the originating CPU (like we already did for Rx skbs). - Various data structure layout and prefetch optimizations from Eric. - Remove ktime_get() from the recvmsg() fast path, ktime_get() is sadly quite expensive on recent AMD machines. - Extend threaded NAPI polling to allow the kthread busy poll for packets. - Make MPTCP use Rx backlog processing. This lowers the lock pressure, improving the Rx performance. - Support memcg accounting of MPTCP socket memory. - Allow admin to opt sockets out of global protocol memory accounting (using a sysctl or BPF-based policy). The global limits are a poor fit for modern container workloads, where limits are imposed using cgroups. - Improve heuristics for when to kick off AF_UNIX garbage collection. - Allow users to control TCP SACK compression, and default to 33% of RTT. - Add tcp_rcvbuf_low_rtt sysctl to let datacenter users avoid unnecessarily aggressive rcvbuf growth and overshot when the connection RTT is low. - Preserve skb metadata space across skb_push / skb_pull operations. - Support for IPIP encapsulation in the nftables flowtable offload. - Support appending IP interface information to ICMP messages (RFC 5837). - Support setting max record size in TLS (RFC 8449). - Remove taking rtnl_lock from RTM_GETNEIGHTBL and RTM_SETNEIGHTBL. - Use a dedicated lock (and RCU) in MPLS, instead of rtnl_lock. - Let users configure the number of write buffers in SMC. - Add new struct sockaddr_unsized for sockaddr of unknown length, from Kees. - Some conversions away from the crypto_ahash API, from Eric Biggers. - Some preparations for slimming down struct page. - YAML Netlink protocol spec for WireGuard. - Add a tool on top of YAML Netlink specs/lib for reporting commonly computed derived statistics and summarized system state. Driver API: - Add CAN XL support to the CAN Netlink interface. - Add uAPI for reporting PHY Mean Square Error (MSE) diagnostics, as defined by the OPEN Alliance's "Advanced diagnostic features for 100BASE-T1 automotive Ethernet PHYs" specification. - Add DPLL phase-adjust-gran pin attribute (and implement it in zl3073x). - Refactor xfrm_input lock to reduce contention when NIC offloads IPsec and performs RSS. - Add info to devlink params whether the current setting is the default or a user override. Allow resetting back to default. - Add standard device stats for PSP crypto offload. - Leverage DSA frame broadcast to implement simple HSR frame duplication for a lot of switches without dedicated HSR offload. - Add uAPI defines for 1.6Tbps link modes. Device drivers: - Add Motorcomm YT921x gigabit Ethernet switch support. - Add MUCSE driver for N500/N210 1GbE NIC series. - Convert drivers to support dedicated ops for timestamping control, and away from the direct IOCTL handling. While at it support GET operations for PHY timestamping. - Add (and convert most drivers to) a dedicated ethtool callback for reading the Rx ring count. - Significant refactoring efforts in the STMMAC driver, which supports Synopsys turn-key MAC IP integrated into a ton of SoCs. - Ethernet high-speed NICs: - Broadcom (bnxt): - support PPS in/out on all pins - Intel (100G, ice, idpf): - ice: implement standard ethtool and timestamping stats - i40e: support setting the max number of MAC addresses per VF - iavf: support RSS of GTP tunnels for 5G and LTE deployments - nVidia/Mellanox (mlx5): - reduce downtime on interface reconfiguration - disable being an XDP redirect target by default (same as other drivers) to avoid wasting resources if feature is unused - Meta (fbnic): - add support for Linux-managed PCS on 25G, 50G, and 100G links - Wangxun: - support Rx descriptor merge, and Tx head writeback - support Rx coalescing offload - support 25G SPF and 40G QSFP modules - Ethernet virtual: - Google (gve): - allow ethtool to configure rx_buf_len - implement XDP HW RX Timestamping support for DQ descriptor format - Microsoft vNIC (mana): - support HW link state events - handle hardware recovery events when probing the device - Ethernet NICs consumer, and embedded: - usbnet: add support for Byte Queue Limits (BQL) - AMD (amd-xgbe): - add device selftests - NXP (enetc): - add i.MX94 support - Broadcom integrated MACs (bcmgenet, bcmasp): - bcmasp: add support for PHY-based Wake-on-LAN - Broadcom switches (b53): - support port isolation - support BCM5389/97/98 and BCM63XX ARL formats - Lantiq/MaxLinear switches: - support bridge FDB entries on the CPU port - use regmap for register access - allow user to enable/disable learning - support Energy Efficient Ethernet - support configuring RMII clock delays - add tagging driver for MaxLinear GSW1xx switches - Synopsys (stmmac): - support using the HW clock in free running mode - add Eswin EIC7700 support - add Rockchip RK3506 support - add Altera Agilex5 support - Cadence (macb): - cleanup and consolidate descriptor and DMA address handling - add EyeQ5 support - TI: - icssg-prueth: support AF_XDP - Airoha access points: - add missing Ethernet stats and link state callback - add AN7583 support - support out-of-order Tx completion processing - Power over Ethernet: - pd692x0: preserve PSE configuration across reboots - add support for TPS23881B devices - Ethernet PHYs: - Open Alliance OATC14 10BASE-T1S PHY cable diagnostic support - Support 50G SerDes and 100G interfaces in Linux-managed PHYs - micrel: - support for non PTP SKUs of lan8814 - enable in-band auto-negotiation on lan8814 - realtek: - cable testing support on RTL8224 - interrupt support on RTL8221B - motorcomm: support for PHY LEDs on YT853 - microchip: support for LAN867X Rev.D0 PHYs w/ SQI and cable diag - mscc: support for PHY LED control - CAN drivers: - m_can: add support for optional reset and system wake up - remove can_change_mtu() obsoleted by core handling - mcp251xfd: support GPIO controller functionality - Bluetooth: - add initial support for PASTa - WiFi: - split ieee80211.h file, it's way too big - improvements in VHT radiotap reporting, S1G, Channel Switch Announcement handling, rate tracking in mesh networks - improve multi-radio monitor mode support, and add a cfg80211 debugfs interface for it - HT action frame handling on 6 GHz - initial chanctx work towards NAN - MU-MIMO sniffer improvements - WiFi drivers: - RealTek (rtw89): - support USB devices RTL8852AU and RTL8852CU - initial work for RTL8922DE - improved injection support - Intel: - iwlwifi: new sniffer API support - MediaTek (mt76): - WED support for >32-bit DMA - airoha NPU support - regdomain improvements - continued WiFi7/MLO work - Qualcomm/Atheros: - ath10k: factory test support - ath11k: TX power insertion support - ath12k: BSS color change support - ath12k: statistics improvements - brcmfmac: Acer A1 840 tablet quirk - rtl8xxxu: 40 MHz connection fixes/support" * tag 'net-next-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1381 commits) net: page_pool: sanitise allocation order net: page pool: xa init with destroy on pp init net/mlx5e: Support XDP target xmit with dummy program net/mlx5e: Update XDP features in switch channels selftests/tc-testing: Test CAKE scheduler when enqueue drops packets net/sched: sch_cake: Fix incorrect qlen reduction in cake_drop wireguard: netlink: generate netlink code wireguard: uapi: generate header with ynl-gen wireguard: uapi: move flag enums wireguard: uapi: move enum wg_cmd wireguard: netlink: add YNL specification selftests: drv-net: Fix tolerance calculation in devlink_rate_tc_bw.py selftests: drv-net: Fix and clarify TC bandwidth split in devlink_rate_tc_bw.py selftests: drv-net: Set shell=True for sysfs writes in devlink_rate_tc_bw.py selftests: drv-net: Use Iperf3Runner in devlink_rate_tc_bw.py selftests: drv-net: introduce Iperf3Runner for measurement use cases selftests: drv-net: Add devlink_rate_tc_bw.py to TEST_PROGS net: ps3_gelic_net: Use napi_alloc_skb() and napi_gro_receive() Documentation: net: dsa: mention simple HSR offload helpers Documentation: net: dsa: mention availability of RedBox ...
2025-11-29selftests/bpf: do not hardcode target rate in test_tc_edt BPF programAlexis Lothoré (eBPF Foundation)2-3/+4
test_tc_edt currently defines the target rate in both the userspace and BPF parts. This value could be defined once in the userspace part if we make it able to configure the BPF program before starting the test. Add a target_rate variable in the BPF part, and make the userspace part set it to the desired rate before attaching the shaping program. Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com> Link: https://lore.kernel.org/r/20251128-tc_edt-v2-4-26db48373e73@bootlin.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-29selftests/bpf: remove test_tc_edt.shAlexis Lothoré (eBPF Foundation)2-102/+0
Now that test_tc_edt has been integrated in test_progs, remove the legacy shell script. Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com> Link: https://lore.kernel.org/r/20251128-tc_edt-v2-3-26db48373e73@bootlin.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-29selftests/bpf: integrate test_tc_edt into test_progsAlexis Lothoré (eBPF Foundation)2-1/+145
test_tc_edt.sh uses a pair of veth and a BPF program attached to the TX veth to shape the traffic to 5MBps. It then checks that the amount of received bytes (at interface level), compared to the TX duration, indeed matches 5Mbps. Convert this test script to the test_progs framework: - keep the double veth setup, isolated in two veths - run a small tcp server, and connect client to server - push a pre-configured amount of bytes, and measure how much time has been needed to push those - ensure that this rate is in a 2% error margin around the target rate This two percent value, while being tight, is hopefully large enough to not make the test too flaky in CI, while also turning it into a small example of BPF-based shaping. Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com> Link: https://lore.kernel.org/r/20251128-tc_edt-v2-2-26db48373e73@bootlin.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-29selftests/bpf: rename test_tc_edt.bpf.c section to expose program typeAlexis Lothoré (eBPF Foundation)2-2/+3
The test_tc_edt BPF program uses a custom section name, which works fine when manually loading it with tc, but prevents it from being loaded with libbpf. Update the program section name to "tc" to be able to manipulate it with a libbpf-based C test. Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com> Link: https://lore.kernel.org/r/20251128-tc_edt-v2-1-26db48373e73@bootlin.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-29selftests/bpf: Add success stats to rqspinlock stress testKumar Kartikeya Dwivedi1-12/+43
Add stats to observe the success and failure rate of lock acquisition attempts in various contexts. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20251128232802.1031906-7-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-29bpf: Remove runqslower toolHoyeon Lee3-20/+1
runqslower was added in commit 9c01546d26d2 "tools/bpf: Add runqslower tool to tools/bpf" as a BCC port to showcase early BPF CO-RE + libbpf workflows. runqslower continues to live in BCC (libbpf-tools), so there is no need to keep building and maintaining it. Drop tools/bpf/runqslower and remove all build hooks in tools/bpf and selftests accordingly. Signed-off-by: Hoyeon Lee <hoyeon.lee@suse.com> Link: https://lore.kernel.org/r/20251126093821.373291-1-hoyeon.lee@suse.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-29selftests/bpf: Remove usage of lsm/file_alloc_security in selftestAmery Hung3-7/+7
file_alloc_security hook is disabled. Use other LSM hooks in selftests instead. Signed-off-by: Amery Hung <ameryhung@gmail.com> Link: https://lore.kernel.org/r/20251126202927.2584874-2-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-29bpf: force BPF_F_RDONLY_PROG on insn array creationAnton Protopopov1-1/+1
The original implementation added a hack to check_mem_access() to prevent programs from writing into insn arrays. To get rid of this hack, enforce BPF_F_RDONLY_PROG on map creation. Also fix the corresponding selftest, as the error message changes with this patch. Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com> Link: https://lore.kernel.org/r/20251128063224.1305482-2-a.s.protopopov@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-28selftests: complete kselftest include centralizationBala-Vignesh-Reddy1-1/+1
This follow-up patch completes centralization of kselftest.h and ksefltest_harness.h includes in remaining seltests files, replacing all relative paths with a non-relative paths using shared -I include path in lib.mk Tested with gcc-13.3 and clang-18.1, and cross-compiled successfully on riscv, arm64, x86_64 and powerpc arch. [reddybalavignesh9979@gmail.com: add selftests include path for kselftest.h] Link: https://lkml.kernel.org/r/20251017090201.317521-1-reddybalavignesh9979@gmail.com Link: https://lkml.kernel.org/r/20251016104409.68985-1-reddybalavignesh9979@gmail.com Signed-off-by: Bala-Vignesh-Reddy <reddybalavignesh9979@gmail.com> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/lkml/20250820143954.33d95635e504e94df01930d0@linux-foundation.org/ Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Günther Noack <gnoack@google.com> Cc: Jakub Kacinski <kuba@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mickael Salaun <mic@digikod.net> Cc: Ming Lei <ming.lei@redhat.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Simon Horman <horms@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-26tcp: remove icsk->icsk_retransmit_timerEric Dumazet2-4/+4
Now sk->sk_timer is no longer used by TCP keepalive, we can use its storage for TCP and MPTCP retransmit timers for better cache locality. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20251124175013.1473655-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-26tcp: introduce icsk->icsk_keepalive_timerEric Dumazet2-4/+4
sk->sk_timer has been used for TCP keepalives. Keepalive timers are not in fast path, we want to use sk->sk_timer storage for retransmit timers, for better cache locality. Create icsk->icsk_keepalive_timer and change keepalive code to no longer use sk->sk_timer. Added space is reclaimed in the following patch. This includes changes to MPTCP, which was also using sk_timer. Alias icsk->mptcp_tout_timer and icsk->icsk_keepalive_timer for inet_sk_diag_fill() sake. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20251124175013.1473655-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-26selftests/bpf: Make CS length configurable for rqspinlock stress testKumar Kartikeya Dwivedi1-2/+12
Allow users to configure the critical section delay for both task/normal and NMI contexts, and set to 20ms and 10ms as before by default. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20251125020749.2421610-4-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-26selftests/bpf: Add lock wait time stats to rqspinlock stress testKumar Kartikeya Dwivedi1-0/+104
Add statistics per-CPU broken down by context and various timing windows for the time taken to acquire an rqspinlock. Cases where all acquisitions fit into the 10ms window are skipped from printing, otherwise the full breakdown is displayed when printing the summary. This allows capturing precisely the number of times outlier attempts happened for a given lock in a given context. A critical detail is that time is captured regardless of success or failure, which is important to capture events for failed but long waiting timeout attempts. Output: [ 64.279459] rqspinlock acquisition latency histogram (ms): [ 64.279472] cpu1: total 528426 (normal 526559, nmi 1867) [ 64.279477] 0-1ms: total 524697 (normal 524697, nmi 0) [ 64.279480] 2-2ms: total 3652 (normal 1811, nmi 1841) [ 64.279482] 3-3ms: total 66 (normal 47, nmi 19) [ 64.279485] 4-4ms: total 2 (normal 1, nmi 1) [ 64.279487] 5-5ms: total 1 (normal 1, nmi 0) [ 64.279489] 6-6ms: total 1 (normal 0, nmi 1) [ 64.279490] 101-150ms: total 1 (normal 0, nmi 1) [ 64.279492] >= 251ms: total 6 (normal 2, nmi 4) ... Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20251125020749.2421610-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-26selftests/bpf: Relax CPU requirements for rqspinlock stress testKumar Kartikeya Dwivedi1-1/+1
Only require 2 CPUs for AA, 3 for ABBA, 4 for ABBCCA, which is calculated nicely by adding to the mode enum. Enables running single CPU AA tests. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20251125020749.2421610-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-26selftests/bpf: Call bpf_get_numa_node_id() in trigger_count()Menglong Dong2-4/+6
The bench test "trig-kernel-count" can be used as a baseline comparison for fentry and other benchmarks, and the calling to bpf_get_numa_node_id() should be considered as composition of the baseline. So, let's call it in trigger_count(). Meanwhile, rename trigger_count() to trigger_kernel_count() to make it easier understand. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20251116014242.151110-1-dongml2@chinatelecom.cn
2025-11-25selftests/bpf: Fix htab_update/reenter_update selftest failureSaket Kumar Bhaskar2-15/+41
Since commit 31158ad02ddb ("rqspinlock: Add deadlock detection and recovery") the updated path on re-entrancy now reports deadlock via -EDEADLK instead of the previous -EBUSY. Also, the way reentrancy was exercised (via fentry/lookup_elem_raw) has been fragile because lookup_elem_raw may be inlined (find_kernel_btf_id() will return -ESRCH). To fix this fentry is attached to bpf_obj_free_fields() instead of lookup_elem_raw() and: - The htab map is made to use a BTF-described struct val with a struct bpf_timer so that check_and_free_fields() reliably calls bpf_obj_free_fields() on element replacement. - The selftest is updated to do two updates to the same key (insert + replace) in prog_test. - The selftest is updated to align with expected errno with the kernel’s current behavior. Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Link: https://lore.kernel.org/r/20251117060752.129648-1-skb99@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-24selftests/bpf: Allow selftests to build with older xxdAlan Maguire2-2/+5
Currently selftests require xxd with the "-n <name>" option which allows the user to specify a name not derived from the input object path. Instead of relying on this newer feature, older xxd can be used if we link our desired name ("test_progs_verification_cert") to the input object. Many distros ship xxd in vim-common package and do not have the latest xxd with -n support. Fixes: b720903e2b14d ("selftests/bpf: Enable signature verification for some lskel tests") Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Link: https://lore.kernel.org/r/20251120084754.640405-3-alan.maguire@oracle.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-22selftests: bpf: Add tests for unbalanced rcu_read_lockPuranjay Mohan2-0/+42
As verifier now supports nested rcu critical sections, add new test cases to make sure unbalanced usage of rcu_read_lock()/unlock() is rejected. Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20251117200411.25563-3-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-22bpf: support nested rcu critical sectionsPuranjay Mohan1-1/+1
Currently, nested rcu critical sections are rejected by the verifier and rcu_lock state is managed by a boolean variable. Add support for nested rcu critical sections by make active_rcu_locks a counter similar to active_preempt_locks. bpf_rcu_read_lock() increments this counter and bpf_rcu_read_unlock() decrements it, MEM_RCU -> PTR_UNTRUSTED transition happens when active_rcu_locks drops to 0. Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20251117200411.25563-2-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-22bpf: test the correct stack liveness of tail callsEduard Zingerman1-0/+50
A new test is added: caller_stack_write_tail_call tests that the live stack is correctly tracked for a tail call. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Martin Teichmann <martin.teichmann@xfel.eu> Link: https://lore.kernel.org/r/20251119160355.1160932-5-martin.teichmann@xfel.eu Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-22bpf: test the proper verification of tail callsMartin Teichmann2-2/+90
Three tests are added: - invalidate_pkt_pointers_by_tail_call checks that one can use the packet pointer after a tail call. This was originally possible and also poses not problems, but was made impossible by 1a4607ffba35. - invalidate_pkt_pointers_by_static_tail_call tests a corner case found by Eduard Zingerman during the discussion of the original fix, which was broken in that fix. - subprog_result_tail_call tests that precision propagation works correctly across tail calls. This did not work before. Signed-off-by: Martin Teichmann <martin.teichmann@xfel.eu> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20251119160355.1160932-3-martin.teichmann@xfel.eu Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-22selftests/bpf: Update test_tag to use sha256Xing Guo1-1/+1
commit 603b44162325 ("bpf: Update the bpf_prog_calc_tag to use SHA256") changed digest of prog_tag to SHA256 but forgot to update tests correspondingly. Fix it. Fixes: 603b44162325 ("bpf: Update the bpf_prog_calc_tag to use SHA256") Signed-off-by: Xing Guo <higuoxing@gmail.com> Link: https://lore.kernel.org/r/20251121061458.3145167-1-higuoxing@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-22selftests/bpf: Improve reliability of test_perf_branches_no_hw()Matt Bobrowski2-2/+17
Currently, test_perf_branches_no_hw() relies on the busy loop within test_perf_branches_common() being slow enough to allow at least one perf event sample tick to occur before starting to tear down the backing perf event BPF program. With a relatively small fixed iteration count of 1,000,000, this is not guaranteed on modern fast CPUs, resulting in the test run to subsequently fail with the following: bpf_testmod.ko is already unloaded. Loading bpf_testmod.ko... Successfully loaded bpf_testmod.ko. test_perf_branches_common:PASS:test_perf_branches_load 0 nsec test_perf_branches_common:PASS:attach_perf_event 0 nsec test_perf_branches_common:PASS:set_affinity 0 nsec check_good_sample:PASS:output not valid 0 nsec check_good_sample:PASS:read_branches_size 0 nsec check_good_sample:PASS:read_branches_stack 0 nsec check_good_sample:PASS:read_branches_stack 0 nsec check_good_sample:PASS:read_branches_global 0 nsec check_good_sample:PASS:read_branches_global 0 nsec check_good_sample:PASS:read_branches_size 0 nsec test_perf_branches_no_hw:PASS:perf_event_open 0 nsec test_perf_branches_common:PASS:test_perf_branches_load 0 nsec test_perf_branches_common:PASS:attach_perf_event 0 nsec test_perf_branches_common:PASS:set_affinity 0 nsec check_bad_sample:FAIL:output not valid no valid sample from prog Summary: 0/1 PASSED, 0 SKIPPED, 1 FAILED Successfully unloaded bpf_testmod.ko. On a modern CPU (i.e. one with a 3.5 GHz clock rate), executing 1 million increments of a volatile integer can take significantly less than 1 millisecond. If the spin loop and detachment of the perf event BPF program elapses before the first 1 ms sampling interval elapses, the perf event will never end up firing. Fix this by bumping the loop iteration counter a little within test_perf_branches_common(), along with ensuring adding another loop termination condition which is directly influenced by the backing perf event BPF program executing. Notably, a concious decision was made to not adjust the sample_freq value as that is just not a reliable way to go about fixing the problem. It effectively still leaves the race window open. Fixes: 67306f84ca78c ("selftests/bpf: Add bpf_read_branch_records() selftest") Signed-off-by: Matt Bobrowski <mattbobrowski@google.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20251119143540.2911424-1-mattbobrowski@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-22selftests/bpf: skip test_perf_branches_hw() on unsupported platformsMatt Bobrowski1-3/+3
Gracefully skip the test_perf_branches_hw subtest on platforms that do not support LBR or require specialized perf event attributes to enable branch sampling. For example, AMD's Milan (Zen 3) supports BRS rather than traditional LBR. This requires specific configurations (attr.type = PERF_TYPE_RAW, attr.config = RETIRED_TAKEN_BRANCH_INSTRUCTIONS) that differ from the generic setup used within this test. Notably, it also probably doesn't hold much value to special case perf event configurations for selected micro architectures. Fixes: 67306f84ca78c ("selftests/bpf: Add bpf_read_branch_records() selftest") Signed-off-by: Matt Bobrowski <mattbobrowski@google.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20251120142059.2836181-1-mattbobrowski@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-22selftests: bpf: Enable gotox tests from arm64Puranjay Mohan1-2/+2
arm64 JIT now supports gotox instruction and jumptables, so run tests in verifier_gotox.c for arm64. Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Reviewed-by: Anton Protopopov <a.s.protopopov@gmail.com> Link: https://lore.kernel.org/r/20251117130732.11107-4-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-21selftests/bpf: Use sockaddr_storage instead of sa46 in select_reuseport testHoyeon Lee1-33/+34
The select_reuseport selftest uses a custom sa46 union to represent IPv4 and IPv6 addresses. This custom wrapper requires extra manual handling for address family and field extraction. Replace sa46 with sockaddr_storage and update the helper functions to operate on native socket structures. This simplifies the code and removes unnecessary custom address-handling logic. No functional changes intended. Reviewed-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Hoyeon Lee <hoyeon.lee@suse.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20251121081332.2309838-3-hoyeon.lee@suse.com
2025-11-21selftests/bpf: Use sockaddr_storage directly in cls_redirect testHoyeon Lee1-79/+43
The cls_redirect test uses a custom addr_port/tuple wrapper to represent IPv4/IPv6 addresses and ports. This custom wrapper requires extra conversion logic and specific helpers such as fill_addr_port(), which are no longer necessary when using standard socket address structures. This commit replaces addr_port/tuple with the standard sockaddr_storage so test handles address families and ports using native socket types. It removes the custom helper, eliminates redundant casts, and simplifies the setup helpers without functional changes. set_up_conn() and build_input() now take src/dst sockaddr_storage directly. Reviewed-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Hoyeon Lee <hoyeon.lee@suse.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20251121081332.2309838-2-hoyeon.lee@suse.com
2025-11-20selftests/bpf: Use ASSERT_STRNEQ to factor in long slab cache namesMatt Bobrowski1-1/+2
subtest_kmem_cache_iter_check_slabinfo() fundamentally compares slab cache names parsed out from /proc/slabinfo against those stored within struct kmem_cache_result. The current problem is that the slab cache name within struct kmem_cache_result is stored within a bounded fixed-length array (sized to SLAB_NAME_MAX(32)), whereas the name parsed out from /proc/slabinfo is not. Meaning, using ASSERT_STREQ() can certainly lead to test failures, particularly when dealing with slab cache names that are longer than SLAB_NAME_MAX(32) bytes. Notably, kmem_cache_create() allows callers to create slab caches with somewhat arbitrarily sized names via its __name identifier argument, so exceeding the SLAB_NAME_MAX(32) limit that is in place now can certainly happen. Make subtest_kmem_cache_iter_check_slabinfo() more reliable by only checking up to sizeof(struct kmem_cache_result.name) - 1 using ASSERT_STRNEQ(). Fixes: a496d0cdc84d ("selftests/bpf: Add a test for kmem_cache_iter") Signed-off-by: Matt Bobrowski <mattbobrowski@google.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Song Liu <song@kernel.org> Link: https://patch.msgid.link/20251118073734.4188710-1-mattbobrowski@google.com
2025-11-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski13-12/+613
Cross-merge networking fixes after downstream PR (net-6.18-rc7). No conflicts, adjacent changes: tools/testing/selftests/net/af_unix/Makefile e1bb28bf13f4 ("selftest: af_unix: Add test for SO_PEEK_OFF.") 45a1cd8346ca ("selftests: af_unix: Add tests for ECONNRESET and EOF semantics") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-19selftests/bpf: Replace TCP CC string comparisons with bpf_strncmpHoyeon Lee2-28/+10
The connect4_prog and bpf_iter_setsockopt tests duplicate the same open-coded TCP congestion control string comparison logic. Since bpf_strncmp() provides the same functionality, use it instead to avoid repeated open-coded loops. This change applies only to functional BPF tests and does not affect the verifier performance benchmarks (veristat.cfg). No functional changes intended. Reviewed-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Hoyeon Lee <hoyeon.lee@suse.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20251115225550.1086693-5-hoyeon.lee@suse.com
2025-11-19selftests/bpf: Move common TCP helpers into bpf_tracing_net.hHoyeon Lee5-24/+14
Some BPF selftests contain identical copies of the min(), max(), before(), and after() helpers. These repeated snippets are the same across the tests and do not need to be defined separately. Move these helpers into bpf_tracing_net.h so they can be shared by TCP related BPF programs. This removes repeated code and keeps the helpers in a single place. Reviewed-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Hoyeon Lee <hoyeon.lee@suse.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20251115225550.1086693-4-hoyeon.lee@suse.com
2025-11-15selftests/bpf: Test bpf_skb_check_mtu(BPF_MTU_CHK_SEGS) when ↵Martin KaFai Lau2-1/+34
transport_header is not set Add a test to check that bpf_skb_check_mtu(BPF_MTU_CHK_SEGS) is rejected (-EINVAL) if skb->transport_header is not set. The test needs to lower the MTU of the loopback device. Thus, take this opportunity to run the test in a netns by adding "ns_" to the test name. The "serial_" prefix can then be removed. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20251112232331.1566074-2-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>