| Age | Commit message (Collapse) | Author | Files | Lines |
|
[ Upstream commit fbc7aef517d8765e4c425d2792409bb9bf2e1f13 ]
Same as in __reg64_deduce_bounds(), refine s32/u32 ranges
in __reg32_deduce_bounds() in the following situations:
- s32 range crosses U32_MAX/0 boundary, positive part of the s32 range
overlaps with u32 range:
0 U32_MAX
| [xxxxxxxxxxxxxx u32 range xxxxxxxxxxxxxx] |
|----------------------------|----------------------------|
|xxxxx s32 range xxxxxxxxx] [xxxxxxx|
0 S32_MAX S32_MIN -1
- s32 range crosses U32_MAX/0 boundary, negative part of the s32 range
overlaps with u32 range:
0 U32_MAX
| [xxxxxxxxxxxxxx u32 range xxxxxxxxxxxxxx] |
|----------------------------|----------------------------|
|xxxxxxxxx] [xxxxxxxxxxxx s32 range |
0 S32_MAX S32_MIN -1
- No refinement if ranges overlap in two intervals.
This helps for e.g. consider the following program:
call %[bpf_get_prandom_u32];
w0 &= 0xffffffff;
if w0 < 0x3 goto 1f; // on fall-through u32 range [3..U32_MAX]
if w0 s> 0x1 goto 1f; // on fall-through s32 range [S32_MIN..1]
if w0 s< 0x0 goto 1f; // range can be narrowed to [S32_MIN..-1]
r10 = 0;
1: ...;
The reg_bounds.c selftest is updated to incorporate identical logic,
refinement based on non-overflowing range halves:
((x ∩ [0, smax]) ∩ (y ∩ [0, smax])) ∪
((x ∩ [smin,-1]) ∩ (y ∩ [smin,-1]))
Reported-by: Andrea Righi <arighi@nvidia.com>
Reported-by: Emil Tsalapatis <emil@etsalapatis.com>
Closes: https://lore.kernel.org/bpf/aakqucg4vcujVwif@gpd4/T/
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260306-bpf-32-bit-range-overflow-v3-1-f7f67e060a6b@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 6c2128505f61b504c79a20b89596feba61388112 ]
process_bpf_exit_full() passes check_lock = !curframe to
check_resource_leak(), which is false in cases when bpf_throw() is
called from a static subprog. This makes check_resource_leak() to skip
validation of active_rcu_locks, active_preempt_locks, and
active_irq_id on exception exits from subprogs.
At runtime bpf_throw() unwinds the stack via ORC without releasing any
user-acquired locks, which may cause various issues as the result.
Fix by setting check_lock = true for exception exits regardless of
curframe, since exceptions bypass all intermediate frame
cleanup. Update the error message prefix to "bpf_throw" for exception
exits to distinguish them from normal BPF_EXIT.
Fix reject_subprog_with_rcu_read_lock test which was previously
passing for the wrong reason. Test program returned directly from the
subprog call without closing the RCU section, so the error was
triggered by the unclosed RCU lock on normal exit, not by
bpf_throw. Update __msg annotations for affected tests to match the
new "bpf_throw" error prefix.
The spin_lock case is not affected because they are already checked [1]
at the call site in do_check_insn() before bpf_throw can run.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/bpf/verifier.c?h=v7.0-rc4#n21098
Assisted-by: Claude:claude-opus-4-6
Fixes: f18b03fabaa9 ("bpf: Implement BPF exceptions")
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260320000809.643798-1-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 024cea2d647ed8ab942f19544b892d324dba42b4 ]
The reg_bounds_crafted tests validate the verifier's range analysis
logic. They focus on the actual ranges and thus ignore the tnum. As a
consequence, they carry the assumption that the tested cases can be
reproduced in userspace without using the tnum information.
Unfortunately, the previous change the refinement logic breaks that
assumption for one test case:
(u64)2147483648 (u32)<op> [4294967294; 0x100000000]
The tested bytecode is shown below. Without our previous improvement, on
the false branch of the condition, R7 is only known to have u64 range
[0xfffffffe; 0x100000000]. With our improvement, and using the tnum
information, we can deduce that R7 equals 0x100000000.
19: (bc) w0 = w6 ; R6=0x80000000
20: (bc) w0 = w7 ; R7=scalar(smin=umin=0xfffffffe,smax=umax=0x100000000,smin32=-2,smax32=0,var_off=(0x0; 0x1ffffffff))
21: (be) if w6 <= w7 goto pc+3 ; R6=0x80000000 R7=0x100000000
R7's tnum is (0; 0x1ffffffff). On the false branch, regs_refine_cond_op
refines R7's u32 range to [0; 0x7fffffff]. Then, __reg32_deduce_bounds
refines the s32 range to 0 using u32 and finally also sets u32=0.
From this, __reg_bound_offset improves the tnum to (0; 0x100000000).
Finally, our previous patch uses this new tnum to deduce that it only
intersect with u64=[0xfffffffe; 0x100000000] in a single value:
0x100000000.
Because the verifier uses the tnum to reach this constant value, the
selftest is unable to reproduce it by only simulating ranges. The
solution implemented in this patch is to change the test case such that
there is more than one overlap value between u64 and the tnum. The max.
u64 value is thus changed from 0x100000000 to 0x300000000.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/50641c6a7ef39520595dcafa605692427c1006ec.1772225741.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 2658a1720a1944fbaeda937000ad2b3c3dfaf1bb ]
Fix an inconsistency between func_states_equal() and
collect_linked_regs():
- regsafe() uses check_ids() to verify that cached and current states
have identical register id mapping.
- func_states_equal() calls regsafe() only for registers computed as
live by compute_live_registers().
- clean_live_states() is supposed to remove dead registers from cached
states, but it can skip states belonging to an iterator-based loop.
- collect_linked_regs() collects all registers sharing the same id,
ignoring the marks computed by compute_live_registers().
Linked registers are stored in the state's jump history.
- backtrack_insn() marks all linked registers for an instruction
as precise whenever one of the linked registers is precise.
The above might lead to a scenario:
- There is an instruction I with register rY known to be dead at I.
- Instruction I is reached via two paths: first A, then B.
- On path A:
- There is an id link between registers rX and rY.
- Checkpoint C is created at I.
- Linked register set {rX, rY} is saved to the jump history.
- rX is marked as precise at I, causing both rX and rY
to be marked precise at C.
- On path B:
- There is no id link between registers rX and rY,
otherwise register states are sub-states of those in C.
- Because rY is dead at I, check_ids() returns true.
- Current state is considered equal to checkpoint C,
propagate_precision() propagates spurious precision
mark for register rY along the path B.
- Depending on a program, this might hit verifier_bug()
in the backtrack_insn(), e.g. if rY ∈ [r1..r5]
and backtrack_insn() spots a function call.
The reproducer program is in the next patch.
This was hit by sched_ext scx_lavd scheduler code.
Changes in tests:
- verifier_scalar_ids.c selftests need modification to preserve
some registers as live for __msg() checks.
- exceptions_assert.c adjusted to match changes in the verifier log,
R0 is dead after conditional instruction and thus does not get
range.
- precise.c adjusted to match changes in the verifier log, register r9
is dead after comparison and it's range is not important for test.
Reported-by: Emil Tsalapatis <emil@etsalapatis.com>
Fixes: 0fb3cf6110a5 ("bpf: use register liveness information for func_states_equal")
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260306-linked-regs-and-propagate-precision-v1-1-18e859be570d@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 6881af27f9ea0f5ca8f606f573ef5cc25ca31fe4 ]
Dmabuf name allocations can be less than DMA_BUF_NAME_LEN characters,
but bpf_probe_read_kernel always tries to read exactly that many bytes.
If a name is less than DMA_BUF_NAME_LEN characters,
bpf_probe_read_kernel will read past the end. bpf_probe_read_kernel_str
stops at the first NUL terminator so use it instead, like
iter_dmabuf_for_each already does.
Fixes: ae5d2c59ecd7 ("selftests/bpf: Add test for dmabuf_iter")
Reported-by: Jerome Lee <jaewookl@quicinc.com>
Signed-off-by: T.J. Mercier <tjmercier@google.com>
Link: https://lore.kernel.org/r/20260225003349.113746-1-tjmercier@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 88af9fefed412e4bea9a1a771cbe6fe347fa3507 ]
The issue occurs in TOO_MANY_FRAGS test case when xdp_zc_max_segs is set to
an odd number.
TOO_MANY_FRAGS test case contains an invalid packet consisting of
(xdp_zc_max_segs) frags. Every frag, even the last one has XDP_PKT_CONTD
flag set. This packet is expected to be dropped. After that, there is a
valid linear packet, which is expected to be received back.
Once (xdp_zc_max_segs) is an odd number, the last packet cannot be
received, if packet forwarding between Rx and Tx interfaces relies on
the ethernet header, e.g. checks for ETH_P_LOOPBACK. Packet is malformed,
if all traffic is looped.
Turns out, sending function processes multiple invalid frags as if they
were in 2-frag packets. So once the invalid mbuf packet contains an odd
number of those, the valid packet after gets paired with the previous
invalid descriptor, and hence does not get an ethernet header generated, so
it is either dropped or malformed.
Make invalid packets in verbatim mode always have only a single frag. For
such packets, number of frags is otherwise meaningless, as descriptor flags
are pre-configured in verbatim mode and packet data is not generated for
invalid descriptors.
Fixes: 697604492b64 ("selftests/xsk: add invalid descriptor test for multi-buffer")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Link: https://lore.kernel.org/r/20260203155103.2305816-3-larysa.zaremba@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 42e41b2a0afa04ca49ee2725aadf90ccb058ed28 ]
Referenced commit reduced the scope of the variable pkt, so now it has to
be reinitialized via pkt_stream_get_next_rx_pkt(), which also increments
some counters. When the packet is interrupted by the batch ending, pkt
stream therefore proceeds to the next packet, while xsk ring still contains
the previous one, this results in a pkt_nb mismatch.
Decrement the affected counters when packet is interrupted.
Fixes: 8913e653e9b8 ("selftests/xsk: Iterate over all the sockets in the receive pkts function")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Link: https://lore.kernel.org/r/20260203155103.2305816-2-larysa.zaremba@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 0207f94971e72a13380e28022c86da147e8e090f ]
We now include the attached function in the stack trace,
fixing the test accordingly.
Fixes: c9e208fa93cd ("selftests/bpf: Add stacktrace ips test for kprobe_multi/kretprobe_multi")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260126211837.472802-4-jolsa@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit a32ae2658471dd87a2f7a438388ed7d9a5767212 ]
When wq__attach() fails, serial_test_wq() returns early without calling
wq__destroy(), leaking the skeleton resources allocated by
wq__open_and_load(). This causes ASAN leak reports in selftests runs.
Fix this by jumping to a common clean_up label that calls wq__destroy()
on all exit paths after successful open_and_load.
Note that the early return after wq__open_and_load() failure is correct
and doesn't need fixing, since that function returns NULL on failure
(after internally cleaning up any partial allocations).
Fixes: 8290dba51910 ("selftests/bpf: wq: add bpf_wq_start() checks")
Signed-off-by: Kery Qi <qikeyu2017@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20260121094114.1801-3-qikeyu2017@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit c286e7e9d1f1f3d90ad11c37e896f582b02d19c4 ]
The order of the variables in the printf() doesn't match the text and
therefore veristat prints something like this:
Done. Processed 24 files, 0 programs. Skipped 62 files, 0 programs.
When it should print:
Done. Processed 24 files, 62 programs. Skipped 0 files, 0 programs.
Fix the order of variables in the printf() call.
Fixes: 518fee8bfaf2 ("selftests/bpf: make veristat skip non-BPF and failing-to-open BPF objects")
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20251231221052.759396-1-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
Update the selftest to check that the metadata size check takes the
xdp_frame size into account in bpf_prog_test_run. The original
check (for meta size 256) was broken because the data frame supplied was
smaller than this, triggering a different EINVAL return. So supply a
larger data frame for this test to make sure we actually exercise the
check we think we are.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/r/20260105114747.1358750-2-toke@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Pull bpf fixes from Alexei Starovoitov:
- Fix BPF builds due to -fms-extensions. selftests (Alexei
Starovoitov), bpftool (Quentin Monnet).
- Fix build of net/smc when CONFIG_BPF_SYSCALL=y, but CONFIG_BPF_JIT=n
(Geert Uytterhoeven)
- Fix livepatch/BPF interaction and support reliable unwinding through
BPF stack frames (Josh Poimboeuf)
- Do not audit capability check in arm64 JIT (Ondrej Mosnacek)
- Fix truncated dmabuf BPF iterator reads (T.J. Mercier)
- Fix verifier assumptions of bpf_d_path's output buffer (Shuran Liu)
- Fix warnings in libbpf when built with -Wdiscarded-qualifiers under
C23 (Mikhail Gavrilov)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: add regression test for bpf_d_path()
bpf: Fix verifier assumptions of bpf_d_path's output buffer
selftests/bpf: Add test for truncated dmabuf_iter reads
bpf: Fix truncated dmabuf iterator reads
x86/unwind/orc: Support reliable unwinding through BPF stack frames
bpf: Add bpf_has_frame_pointer()
bpf, arm64: Do not audit capability check in do_jit()
libbpf: Fix -Wdiscarded-qualifiers under C23
bpftool: Fix build warnings due to MS extensions
net: smc: SMC_HS_CTRL_BPF should depend on BPF_JIT
selftests/bpf: Add -fms-extensions to bpf build flags
|
|
Add a regression test for bpf_d_path() to cover incorrect verifier
assumptions caused by an incorrect function prototype. The test
attaches to the fallocate hook, calls bpf_d_path() and verifies that
a simple prefix comparison on the returned pathname behaves correctly
after the fix in patch 1. It ensures the verifier does not assume
the buffer remains unwritten.
Co-developed-by: Zesen Liu <ftyg@live.com>
Signed-off-by: Zesen Liu <ftyg@live.com>
Co-developed-by: Peili Gao <gplhust955@gmail.com>
Signed-off-by: Peili Gao <gplhust955@gmail.com>
Co-developed-by: Haoran Ni <haoran.ni.cs@gmail.com>
Signed-off-by: Haoran Ni <haoran.ni.cs@gmail.com>
Signed-off-by: Shuran Liu <electronlsr@gmail.com>
Link: https://lore.kernel.org/r/20251206141210.3148-3-electronlsr@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
If many dmabufs are present, reads of the dmabuf iterator can be
truncated at PAGE_SIZE or user buffer size boundaries before the fix in
"bpf: Fix truncated dmabuf iterator reads". Add a test to
confirm truncation does not occur.
Signed-off-by: T.J. Mercier <tjmercier@google.com>
Link: https://lore.kernel.org/r/20251204000348.1413593-2-tjmercier@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Replace instances of "__auto_type" with "auto" in:
tools/testing/selftests/bpf/prog_tests/socket_helpers.h
This file does not seem to be including <linux/compiler_types.h>
directly or indirectly, so copy the definition but guard it with
!defined(auto).
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- "panic: sys_info: Refactor and fix a potential issue" (Andy Shevchenko)
fixes a build issue and does some cleanup in ib/sys_info.c
- "Implement mul_u64_u64_div_u64_roundup()" (David Laight)
enhances the 64-bit math code on behalf of a PWM driver and beefs up
the test module for these library functions
- "scripts/gdb/symbols: make BPF debug info available to GDB" (Ilya Leoshkevich)
makes BPF symbol names, sizes, and line numbers available to the GDB
debugger
- "Enable hung_task and lockup cases to dump system info on demand" (Feng Tang)
adds a sysctl which can be used to cause additional info dumping when
the hung-task and lockup detectors fire
- "lib/base64: add generic encoder/decoder, migrate users" (Kuan-Wei Chiu)
adds a general base64 encoder/decoder to lib/ and migrates several
users away from their private implementations
- "rbree: inline rb_first() and rb_last()" (Eric Dumazet)
makes TCP a little faster
- "liveupdate: Rework KHO for in-kernel users" (Pasha Tatashin)
reworks the KEXEC Handover interfaces in preparation for Live Update
Orchestrator (LUO), and possibly for other future clients
- "kho: simplify state machine and enable dynamic updates" (Pasha Tatashin)
increases the flexibility of KEXEC Handover. Also preparation for LUO
- "Live Update Orchestrator" (Pasha Tatashin)
is a major new feature targeted at cloud environments. Quoting the
cover letter:
This series introduces the Live Update Orchestrator, a kernel
subsystem designed to facilitate live kernel updates using a
kexec-based reboot. This capability is critical for cloud
environments, allowing hypervisors to be updated with minimal
downtime for running virtual machines. LUO achieves this by
preserving the state of selected resources, such as memory,
devices and their dependencies, across the kernel transition.
As a key feature, this series includes support for preserving
memfd file descriptors, which allows critical in-memory data, such
as guest RAM or any other large memory region, to be maintained in
RAM across the kexec reboot.
Mike Rappaport merits a mention here, for his extensive review and
testing work.
- "kexec: reorganize kexec and kdump sysfs" (Sourabh Jain)
moves the kexec and kdump sysfs entries from /sys/kernel/ to
/sys/kernel/kexec/ and adds back-compatibility symlinks which can
hopefully be removed one day
- "kho: fixes for vmalloc restoration" (Mike Rapoport)
fixes a BUG which was being hit during KHO restoration of vmalloc()
regions
* tag 'mm-nonmm-stable-2025-12-06-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (139 commits)
calibrate: update header inclusion
Reinstate "resource: avoid unnecessary lookups in find_next_iomem_res()"
vmcoreinfo: track and log recoverable hardware errors
kho: fix restoring of contiguous ranges of order-0 pages
kho: kho_restore_vmalloc: fix initialization of pages array
MAINTAINERS: TPM DEVICE DRIVER: update the W-tag
init: replace simple_strtoul with kstrtoul to improve lpj_setup
KHO: fix boot failure due to kmemleak access to non-PRESENT pages
Documentation/ABI: new kexec and kdump sysfs interface
Documentation/ABI: mark old kexec sysfs deprecated
kexec: move sysfs entries to /sys/kernel/kexec
test_kho: always print restore status
kho: free chunks using free_page() instead of kfree()
selftests/liveupdate: add kexec test for multiple and empty sessions
selftests/liveupdate: add simple kexec-based selftest for LUO
selftests/liveupdate: add userspace API selftests
docs: add documentation for memfd preservation via LUO
mm: memfd_luo: allow preserving memfd
liveupdate: luo_file: add private argument to store runtime state
mm: shmem: export some functions to internal.h
...
|
|
The kernel is now built with -fms-extensions, therefore
generated vmlinux.h contains types like:
struct slab {
..
struct freelist_counters;
};
Use -fms-extensions and -Wno-microsoft-anon-tag flags
to build bpf programs that #include "vmlinux.h"
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core & protocols:
- Replace busylock at the Tx queuing layer with a lockless list.
Resulting in a 300% (4x) improvement on heavy TX workloads, sending
twice the number of packets per second, for half the cpu cycles.
- Allow constantly busy flows to migrate to a more suitable CPU/NIC
queue.
Normally we perform queue re-selection when flow comes out of idle,
but under extreme circumstances the flows may be constantly busy.
Add sysctl to allow periodic rehashing even if it'd risk packet
reordering.
- Optimize the NAPI skb cache, make it larger, use it in more paths.
- Attempt returning Tx skbs to the originating CPU (like we already
did for Rx skbs).
- Various data structure layout and prefetch optimizations from Eric.
- Remove ktime_get() from the recvmsg() fast path, ktime_get() is
sadly quite expensive on recent AMD machines.
- Extend threaded NAPI polling to allow the kthread busy poll for
packets.
- Make MPTCP use Rx backlog processing. This lowers the lock
pressure, improving the Rx performance.
- Support memcg accounting of MPTCP socket memory.
- Allow admin to opt sockets out of global protocol memory accounting
(using a sysctl or BPF-based policy). The global limits are a poor
fit for modern container workloads, where limits are imposed using
cgroups.
- Improve heuristics for when to kick off AF_UNIX garbage collection.
- Allow users to control TCP SACK compression, and default to 33% of
RTT.
- Add tcp_rcvbuf_low_rtt sysctl to let datacenter users avoid
unnecessarily aggressive rcvbuf growth and overshot when the
connection RTT is low.
- Preserve skb metadata space across skb_push / skb_pull operations.
- Support for IPIP encapsulation in the nftables flowtable offload.
- Support appending IP interface information to ICMP messages (RFC
5837).
- Support setting max record size in TLS (RFC 8449).
- Remove taking rtnl_lock from RTM_GETNEIGHTBL and RTM_SETNEIGHTBL.
- Use a dedicated lock (and RCU) in MPLS, instead of rtnl_lock.
- Let users configure the number of write buffers in SMC.
- Add new struct sockaddr_unsized for sockaddr of unknown length,
from Kees.
- Some conversions away from the crypto_ahash API, from Eric Biggers.
- Some preparations for slimming down struct page.
- YAML Netlink protocol spec for WireGuard.
- Add a tool on top of YAML Netlink specs/lib for reporting commonly
computed derived statistics and summarized system state.
Driver API:
- Add CAN XL support to the CAN Netlink interface.
- Add uAPI for reporting PHY Mean Square Error (MSE) diagnostics, as
defined by the OPEN Alliance's "Advanced diagnostic features for
100BASE-T1 automotive Ethernet PHYs" specification.
- Add DPLL phase-adjust-gran pin attribute (and implement it in
zl3073x).
- Refactor xfrm_input lock to reduce contention when NIC offloads
IPsec and performs RSS.
- Add info to devlink params whether the current setting is the
default or a user override. Allow resetting back to default.
- Add standard device stats for PSP crypto offload.
- Leverage DSA frame broadcast to implement simple HSR frame
duplication for a lot of switches without dedicated HSR offload.
- Add uAPI defines for 1.6Tbps link modes.
Device drivers:
- Add Motorcomm YT921x gigabit Ethernet switch support.
- Add MUCSE driver for N500/N210 1GbE NIC series.
- Convert drivers to support dedicated ops for timestamping control,
and away from the direct IOCTL handling. While at it support GET
operations for PHY timestamping.
- Add (and convert most drivers to) a dedicated ethtool callback for
reading the Rx ring count.
- Significant refactoring efforts in the STMMAC driver, which
supports Synopsys turn-key MAC IP integrated into a ton of SoCs.
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- support PPS in/out on all pins
- Intel (100G, ice, idpf):
- ice: implement standard ethtool and timestamping stats
- i40e: support setting the max number of MAC addresses per VF
- iavf: support RSS of GTP tunnels for 5G and LTE deployments
- nVidia/Mellanox (mlx5):
- reduce downtime on interface reconfiguration
- disable being an XDP redirect target by default (same as
other drivers) to avoid wasting resources if feature is
unused
- Meta (fbnic):
- add support for Linux-managed PCS on 25G, 50G, and 100G links
- Wangxun:
- support Rx descriptor merge, and Tx head writeback
- support Rx coalescing offload
- support 25G SPF and 40G QSFP modules
- Ethernet virtual:
- Google (gve):
- allow ethtool to configure rx_buf_len
- implement XDP HW RX Timestamping support for DQ descriptor
format
- Microsoft vNIC (mana):
- support HW link state events
- handle hardware recovery events when probing the device
- Ethernet NICs consumer, and embedded:
- usbnet: add support for Byte Queue Limits (BQL)
- AMD (amd-xgbe):
- add device selftests
- NXP (enetc):
- add i.MX94 support
- Broadcom integrated MACs (bcmgenet, bcmasp):
- bcmasp: add support for PHY-based Wake-on-LAN
- Broadcom switches (b53):
- support port isolation
- support BCM5389/97/98 and BCM63XX ARL formats
- Lantiq/MaxLinear switches:
- support bridge FDB entries on the CPU port
- use regmap for register access
- allow user to enable/disable learning
- support Energy Efficient Ethernet
- support configuring RMII clock delays
- add tagging driver for MaxLinear GSW1xx switches
- Synopsys (stmmac):
- support using the HW clock in free running mode
- add Eswin EIC7700 support
- add Rockchip RK3506 support
- add Altera Agilex5 support
- Cadence (macb):
- cleanup and consolidate descriptor and DMA address handling
- add EyeQ5 support
- TI:
- icssg-prueth: support AF_XDP
- Airoha access points:
- add missing Ethernet stats and link state callback
- add AN7583 support
- support out-of-order Tx completion processing
- Power over Ethernet:
- pd692x0: preserve PSE configuration across reboots
- add support for TPS23881B devices
- Ethernet PHYs:
- Open Alliance OATC14 10BASE-T1S PHY cable diagnostic support
- Support 50G SerDes and 100G interfaces in Linux-managed PHYs
- micrel:
- support for non PTP SKUs of lan8814
- enable in-band auto-negotiation on lan8814
- realtek:
- cable testing support on RTL8224
- interrupt support on RTL8221B
- motorcomm: support for PHY LEDs on YT853
- microchip: support for LAN867X Rev.D0 PHYs w/ SQI and cable diag
- mscc: support for PHY LED control
- CAN drivers:
- m_can: add support for optional reset and system wake up
- remove can_change_mtu() obsoleted by core handling
- mcp251xfd: support GPIO controller functionality
- Bluetooth:
- add initial support for PASTa
- WiFi:
- split ieee80211.h file, it's way too big
- improvements in VHT radiotap reporting, S1G, Channel Switch
Announcement handling, rate tracking in mesh networks
- improve multi-radio monitor mode support, and add a cfg80211
debugfs interface for it
- HT action frame handling on 6 GHz
- initial chanctx work towards NAN
- MU-MIMO sniffer improvements
- WiFi drivers:
- RealTek (rtw89):
- support USB devices RTL8852AU and RTL8852CU
- initial work for RTL8922DE
- improved injection support
- Intel:
- iwlwifi: new sniffer API support
- MediaTek (mt76):
- WED support for >32-bit DMA
- airoha NPU support
- regdomain improvements
- continued WiFi7/MLO work
- Qualcomm/Atheros:
- ath10k: factory test support
- ath11k: TX power insertion support
- ath12k: BSS color change support
- ath12k: statistics improvements
- brcmfmac: Acer A1 840 tablet quirk
- rtl8xxxu: 40 MHz connection fixes/support"
* tag 'net-next-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1381 commits)
net: page_pool: sanitise allocation order
net: page pool: xa init with destroy on pp init
net/mlx5e: Support XDP target xmit with dummy program
net/mlx5e: Update XDP features in switch channels
selftests/tc-testing: Test CAKE scheduler when enqueue drops packets
net/sched: sch_cake: Fix incorrect qlen reduction in cake_drop
wireguard: netlink: generate netlink code
wireguard: uapi: generate header with ynl-gen
wireguard: uapi: move flag enums
wireguard: uapi: move enum wg_cmd
wireguard: netlink: add YNL specification
selftests: drv-net: Fix tolerance calculation in devlink_rate_tc_bw.py
selftests: drv-net: Fix and clarify TC bandwidth split in devlink_rate_tc_bw.py
selftests: drv-net: Set shell=True for sysfs writes in devlink_rate_tc_bw.py
selftests: drv-net: Use Iperf3Runner in devlink_rate_tc_bw.py
selftests: drv-net: introduce Iperf3Runner for measurement use cases
selftests: drv-net: Add devlink_rate_tc_bw.py to TEST_PROGS
net: ps3_gelic_net: Use napi_alloc_skb() and napi_gro_receive()
Documentation: net: dsa: mention simple HSR offload helpers
Documentation: net: dsa: mention availability of RedBox
...
|
|
test_tc_edt currently defines the target rate in both the userspace and
BPF parts. This value could be defined once in the userspace part if we
make it able to configure the BPF program before starting the test.
Add a target_rate variable in the BPF part, and make the userspace part
set it to the desired rate before attaching the shaping program.
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20251128-tc_edt-v2-4-26db48373e73@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Now that test_tc_edt has been integrated in test_progs, remove the
legacy shell script.
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20251128-tc_edt-v2-3-26db48373e73@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
test_tc_edt.sh uses a pair of veth and a BPF program attached to the TX
veth to shape the traffic to 5MBps. It then checks that the amount of
received bytes (at interface level), compared to the TX duration, indeed
matches 5Mbps.
Convert this test script to the test_progs framework:
- keep the double veth setup, isolated in two veths
- run a small tcp server, and connect client to server
- push a pre-configured amount of bytes, and measure how much time has
been needed to push those
- ensure that this rate is in a 2% error margin around the target rate
This two percent value, while being tight, is hopefully large enough to
not make the test too flaky in CI, while also turning it into a small
example of BPF-based shaping.
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20251128-tc_edt-v2-2-26db48373e73@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The test_tc_edt BPF program uses a custom section name, which works fine
when manually loading it with tc, but prevents it from being loaded with
libbpf.
Update the program section name to "tc" to be able to manipulate it with
a libbpf-based C test.
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20251128-tc_edt-v2-1-26db48373e73@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add stats to observe the success and failure rate of lock acquisition
attempts in various contexts.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20251128232802.1031906-7-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
runqslower was added in commit 9c01546d26d2 "tools/bpf: Add runqslower
tool to tools/bpf" as a BCC port to showcase early BPF CO-RE + libbpf
workflows. runqslower continues to live in BCC (libbpf-tools), so there
is no need to keep building and maintaining it.
Drop tools/bpf/runqslower and remove all build hooks in tools/bpf and
selftests accordingly.
Signed-off-by: Hoyeon Lee <hoyeon.lee@suse.com>
Link: https://lore.kernel.org/r/20251126093821.373291-1-hoyeon.lee@suse.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
file_alloc_security hook is disabled. Use other LSM hooks in selftests
instead.
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/r/20251126202927.2584874-2-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The original implementation added a hack to check_mem_access()
to prevent programs from writing into insn arrays. To get rid
of this hack, enforce BPF_F_RDONLY_PROG on map creation.
Also fix the corresponding selftest, as the error message changes
with this patch.
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Link: https://lore.kernel.org/r/20251128063224.1305482-2-a.s.protopopov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This follow-up patch completes centralization of kselftest.h and
ksefltest_harness.h includes in remaining seltests files, replacing all
relative paths with a non-relative paths using shared -I include path in
lib.mk
Tested with gcc-13.3 and clang-18.1, and cross-compiled successfully on
riscv, arm64, x86_64 and powerpc arch.
[reddybalavignesh9979@gmail.com: add selftests include path for kselftest.h]
Link: https://lkml.kernel.org/r/20251017090201.317521-1-reddybalavignesh9979@gmail.com
Link: https://lkml.kernel.org/r/20251016104409.68985-1-reddybalavignesh9979@gmail.com
Signed-off-by: Bala-Vignesh-Reddy <reddybalavignesh9979@gmail.com>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/lkml/20250820143954.33d95635e504e94df01930d0@linux-foundation.org/
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Günther Noack <gnoack@google.com>
Cc: Jakub Kacinski <kuba@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mickael Salaun <mic@digikod.net>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Simon Horman <horms@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Now sk->sk_timer is no longer used by TCP keepalive, we can use
its storage for TCP and MPTCP retransmit timers for better
cache locality.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251124175013.1473655-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
sk->sk_timer has been used for TCP keepalives.
Keepalive timers are not in fast path, we want to use sk->sk_timer
storage for retransmit timers, for better cache locality.
Create icsk->icsk_keepalive_timer and change keepalive
code to no longer use sk->sk_timer.
Added space is reclaimed in the following patch.
This includes changes to MPTCP, which was also using sk_timer.
Alias icsk->mptcp_tout_timer and icsk->icsk_keepalive_timer
for inet_sk_diag_fill() sake.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251124175013.1473655-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Allow users to configure the critical section delay for both task/normal
and NMI contexts, and set to 20ms and 10ms as before by default.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20251125020749.2421610-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add statistics per-CPU broken down by context and various timing windows
for the time taken to acquire an rqspinlock. Cases where all
acquisitions fit into the 10ms window are skipped from printing,
otherwise the full breakdown is displayed when printing the summary.
This allows capturing precisely the number of times outlier attempts
happened for a given lock in a given context.
A critical detail is that time is captured regardless of success or
failure, which is important to capture events for failed but long
waiting timeout attempts.
Output:
[ 64.279459] rqspinlock acquisition latency histogram (ms):
[ 64.279472] cpu1: total 528426 (normal 526559, nmi 1867)
[ 64.279477] 0-1ms: total 524697 (normal 524697, nmi 0)
[ 64.279480] 2-2ms: total 3652 (normal 1811, nmi 1841)
[ 64.279482] 3-3ms: total 66 (normal 47, nmi 19)
[ 64.279485] 4-4ms: total 2 (normal 1, nmi 1)
[ 64.279487] 5-5ms: total 1 (normal 1, nmi 0)
[ 64.279489] 6-6ms: total 1 (normal 0, nmi 1)
[ 64.279490] 101-150ms: total 1 (normal 0, nmi 1)
[ 64.279492] >= 251ms: total 6 (normal 2, nmi 4)
...
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20251125020749.2421610-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Only require 2 CPUs for AA, 3 for ABBA, 4 for ABBCCA, which is
calculated nicely by adding to the mode enum. Enables running single CPU
AA tests.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20251125020749.2421610-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The bench test "trig-kernel-count" can be used as a baseline comparison
for fentry and other benchmarks, and the calling to bpf_get_numa_node_id()
should be considered as composition of the baseline. So, let's call it in
trigger_count(). Meanwhile, rename trigger_count() to
trigger_kernel_count() to make it easier understand.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20251116014242.151110-1-dongml2@chinatelecom.cn
|
|
Since commit 31158ad02ddb ("rqspinlock: Add deadlock detection
and recovery") the updated path on re-entrancy now reports deadlock
via -EDEADLK instead of the previous -EBUSY.
Also, the way reentrancy was exercised (via fentry/lookup_elem_raw)
has been fragile because lookup_elem_raw may be inlined
(find_kernel_btf_id() will return -ESRCH).
To fix this fentry is attached to bpf_obj_free_fields() instead of
lookup_elem_raw() and:
- The htab map is made to use a BTF-described struct val with a
struct bpf_timer so that check_and_free_fields() reliably calls
bpf_obj_free_fields() on element replacement.
- The selftest is updated to do two updates to the same key (insert +
replace) in prog_test.
- The selftest is updated to align with expected errno with the
kernel’s current behavior.
Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Link: https://lore.kernel.org/r/20251117060752.129648-1-skb99@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Currently selftests require xxd with the "-n <name>" option
which allows the user to specify a name not derived from
the input object path. Instead of relying on this newer
feature, older xxd can be used if we link our desired name
("test_progs_verification_cert") to the input object.
Many distros ship xxd in vim-common package and do not have
the latest xxd with -n support.
Fixes: b720903e2b14d ("selftests/bpf: Enable signature verification for some lskel tests")
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/r/20251120084754.640405-3-alan.maguire@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
As verifier now supports nested rcu critical sections, add new test
cases to make sure unbalanced usage of rcu_read_lock()/unlock() is
rejected.
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251117200411.25563-3-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Currently, nested rcu critical sections are rejected by the verifier and
rcu_lock state is managed by a boolean variable. Add support for nested
rcu critical sections by make active_rcu_locks a counter similar to
active_preempt_locks. bpf_rcu_read_lock() increments this counter and
bpf_rcu_read_unlock() decrements it, MEM_RCU -> PTR_UNTRUSTED transition
happens when active_rcu_locks drops to 0.
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251117200411.25563-2-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
A new test is added: caller_stack_write_tail_call tests that the live
stack is correctly tracked for a tail call.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Martin Teichmann <martin.teichmann@xfel.eu>
Link: https://lore.kernel.org/r/20251119160355.1160932-5-martin.teichmann@xfel.eu
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Three tests are added:
- invalidate_pkt_pointers_by_tail_call checks that one can use the
packet pointer after a tail call. This was originally possible
and also poses not problems, but was made impossible by 1a4607ffba35.
- invalidate_pkt_pointers_by_static_tail_call tests a corner case
found by Eduard Zingerman during the discussion of the original fix,
which was broken in that fix.
- subprog_result_tail_call tests that precision propagation works
correctly across tail calls. This did not work before.
Signed-off-by: Martin Teichmann <martin.teichmann@xfel.eu>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251119160355.1160932-3-martin.teichmann@xfel.eu
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
commit 603b44162325 ("bpf: Update the bpf_prog_calc_tag to use SHA256")
changed digest of prog_tag to SHA256 but forgot to update tests
correspondingly. Fix it.
Fixes: 603b44162325 ("bpf: Update the bpf_prog_calc_tag to use SHA256")
Signed-off-by: Xing Guo <higuoxing@gmail.com>
Link: https://lore.kernel.org/r/20251121061458.3145167-1-higuoxing@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Currently, test_perf_branches_no_hw() relies on the busy loop within
test_perf_branches_common() being slow enough to allow at least one
perf event sample tick to occur before starting to tear down the
backing perf event BPF program. With a relatively small fixed
iteration count of 1,000,000, this is not guaranteed on modern fast
CPUs, resulting in the test run to subsequently fail with the
following:
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_perf_branches_common:PASS:test_perf_branches_load 0 nsec
test_perf_branches_common:PASS:attach_perf_event 0 nsec
test_perf_branches_common:PASS:set_affinity 0 nsec
check_good_sample:PASS:output not valid 0 nsec
check_good_sample:PASS:read_branches_size 0 nsec
check_good_sample:PASS:read_branches_stack 0 nsec
check_good_sample:PASS:read_branches_stack 0 nsec
check_good_sample:PASS:read_branches_global 0 nsec
check_good_sample:PASS:read_branches_global 0 nsec
check_good_sample:PASS:read_branches_size 0 nsec
test_perf_branches_no_hw:PASS:perf_event_open 0 nsec
test_perf_branches_common:PASS:test_perf_branches_load 0 nsec
test_perf_branches_common:PASS:attach_perf_event 0 nsec
test_perf_branches_common:PASS:set_affinity 0 nsec
check_bad_sample:FAIL:output not valid no valid sample from prog
Summary: 0/1 PASSED, 0 SKIPPED, 1 FAILED
Successfully unloaded bpf_testmod.ko.
On a modern CPU (i.e. one with a 3.5 GHz clock rate), executing 1
million increments of a volatile integer can take significantly less
than 1 millisecond. If the spin loop and detachment of the perf event
BPF program elapses before the first 1 ms sampling interval elapses,
the perf event will never end up firing. Fix this by bumping the loop
iteration counter a little within test_perf_branches_common(), along
with ensuring adding another loop termination condition which is
directly influenced by the backing perf event BPF program
executing. Notably, a concious decision was made to not adjust the
sample_freq value as that is just not a reliable way to go about
fixing the problem. It effectively still leaves the race window open.
Fixes: 67306f84ca78c ("selftests/bpf: Add bpf_read_branch_records() selftest")
Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20251119143540.2911424-1-mattbobrowski@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Gracefully skip the test_perf_branches_hw subtest on platforms that
do not support LBR or require specialized perf event attributes
to enable branch sampling.
For example, AMD's Milan (Zen 3) supports BRS rather than traditional
LBR. This requires specific configurations (attr.type = PERF_TYPE_RAW,
attr.config = RETIRED_TAKEN_BRANCH_INSTRUCTIONS) that differ from the
generic setup used within this test. Notably, it also probably doesn't
hold much value to special case perf event configurations for selected
micro architectures.
Fixes: 67306f84ca78c ("selftests/bpf: Add bpf_read_branch_records() selftest")
Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20251120142059.2836181-1-mattbobrowski@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
arm64 JIT now supports gotox instruction and jumptables, so run tests in
verifier_gotox.c for arm64.
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Reviewed-by: Anton Protopopov <a.s.protopopov@gmail.com>
Link: https://lore.kernel.org/r/20251117130732.11107-4-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The select_reuseport selftest uses a custom sa46 union to represent
IPv4 and IPv6 addresses. This custom wrapper requires extra manual
handling for address family and field extraction.
Replace sa46 with sockaddr_storage and update the helper functions to
operate on native socket structures. This simplifies the code and
removes unnecessary custom address-handling logic. No functional
changes intended.
Reviewed-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Hoyeon Lee <hoyeon.lee@suse.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251121081332.2309838-3-hoyeon.lee@suse.com
|
|
The cls_redirect test uses a custom addr_port/tuple wrapper to represent
IPv4/IPv6 addresses and ports. This custom wrapper requires extra
conversion logic and specific helpers such as fill_addr_port(), which
are no longer necessary when using standard socket address structures.
This commit replaces addr_port/tuple with the standard sockaddr_storage
so test handles address families and ports using native socket types.
It removes the custom helper, eliminates redundant casts, and simplifies
the setup helpers without functional changes. set_up_conn() and
build_input() now take src/dst sockaddr_storage directly.
Reviewed-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Hoyeon Lee <hoyeon.lee@suse.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251121081332.2309838-2-hoyeon.lee@suse.com
|
|
subtest_kmem_cache_iter_check_slabinfo() fundamentally compares slab
cache names parsed out from /proc/slabinfo against those stored within
struct kmem_cache_result. The current problem is that the slab cache
name within struct kmem_cache_result is stored within a bounded
fixed-length array (sized to SLAB_NAME_MAX(32)), whereas the name
parsed out from /proc/slabinfo is not. Meaning, using ASSERT_STREQ()
can certainly lead to test failures, particularly when dealing with
slab cache names that are longer than SLAB_NAME_MAX(32)
bytes. Notably, kmem_cache_create() allows callers to create slab
caches with somewhat arbitrarily sized names via its __name identifier
argument, so exceeding the SLAB_NAME_MAX(32) limit that is in place
now can certainly happen.
Make subtest_kmem_cache_iter_check_slabinfo() more reliable by only
checking up to sizeof(struct kmem_cache_result.name) - 1 using
ASSERT_STRNEQ().
Fixes: a496d0cdc84d ("selftests/bpf: Add a test for kmem_cache_iter")
Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Link: https://patch.msgid.link/20251118073734.4188710-1-mattbobrowski@google.com
|
|
Cross-merge networking fixes after downstream PR (net-6.18-rc7).
No conflicts, adjacent changes:
tools/testing/selftests/net/af_unix/Makefile
e1bb28bf13f4 ("selftest: af_unix: Add test for SO_PEEK_OFF.")
45a1cd8346ca ("selftests: af_unix: Add tests for ECONNRESET and EOF semantics")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The connect4_prog and bpf_iter_setsockopt tests duplicate the same
open-coded TCP congestion control string comparison logic. Since
bpf_strncmp() provides the same functionality, use it instead to
avoid repeated open-coded loops.
This change applies only to functional BPF tests and does not affect
the verifier performance benchmarks (veristat.cfg). No functional
changes intended.
Reviewed-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Hoyeon Lee <hoyeon.lee@suse.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251115225550.1086693-5-hoyeon.lee@suse.com
|
|
Some BPF selftests contain identical copies of the min(), max(),
before(), and after() helpers. These repeated snippets are the same
across the tests and do not need to be defined separately.
Move these helpers into bpf_tracing_net.h so they can be shared by
TCP related BPF programs. This removes repeated code and keeps the
helpers in a single place.
Reviewed-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Hoyeon Lee <hoyeon.lee@suse.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251115225550.1086693-4-hoyeon.lee@suse.com
|
|
transport_header is not set
Add a test to check that bpf_skb_check_mtu(BPF_MTU_CHK_SEGS) is
rejected (-EINVAL) if skb->transport_header is not set. The test
needs to lower the MTU of the loopback device. Thus, take this
opportunity to run the test in a netns by adding "ns_" to the test
name. The "serial_" prefix can then be removed.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20251112232331.1566074-2-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|