Age | Commit message (Collapse) | Author | Files | Lines |
|
[ Upstream commit de07b182899227d5fd1ca7a1a7d495ecd453d49c ]
The strncmp benchmark uses the bpf_strncmp helper and a hand-written
loop to compare two strings. The values of the strings are filled from
userspace. One of the strings is non-const (in .bss) while the other is
const (in .rodata) since that is the requirement of bpf_strncmp.
The problem is that in the hand-written loop, Clang optimizes the reads
from the const string to always return 0 which breaks the benchmark.
Use barrier_var to prevent the optimization.
The effect can be seen on the strncmp-no-helper variant.
Before this change:
# ./bench strncmp-no-helper
Setting up benchmark 'strncmp-no-helper'...
Benchmark 'strncmp-no-helper' started.
Iter 0 (112.309us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s
Iter 1 (-23.238us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s
Iter 2 ( 58.994us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s
Iter 3 (-30.466us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s
Iter 4 ( 29.996us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s
Iter 5 ( 16.949us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s
Iter 6 (-60.035us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s
Summary: hits 0.000 ± 0.000M/s ( 0.000M/prod), drops 0.000 ± 0.000M/s, total operations 0.000 ± 0.000M/s
After this change:
# ./bench strncmp-no-helper
Setting up benchmark 'strncmp-no-helper'...
Benchmark 'strncmp-no-helper' started.
Iter 0 ( 77.711us): hits 5.534M/s ( 5.534M/prod), drops 0.000M/s, total operations 5.534M/s
Iter 1 ( 11.215us): hits 6.006M/s ( 6.006M/prod), drops 0.000M/s, total operations 6.006M/s
Iter 2 (-14.253us): hits 5.931M/s ( 5.931M/prod), drops 0.000M/s, total operations 5.931M/s
Iter 3 ( 59.087us): hits 6.005M/s ( 6.005M/prod), drops 0.000M/s, total operations 6.005M/s
Iter 4 (-21.379us): hits 6.010M/s ( 6.010M/prod), drops 0.000M/s, total operations 6.010M/s
Iter 5 (-20.310us): hits 5.861M/s ( 5.861M/prod), drops 0.000M/s, total operations 5.861M/s
Iter 6 ( 53.937us): hits 6.004M/s ( 6.004M/prod), drops 0.000M/s, total operations 6.004M/s
Summary: hits 5.969 ± 0.061M/s ( 5.969M/prod), drops 0.000 ± 0.000M/s, total operations 5.969 ± 0.061M/s
Fixes: 9c42652f8be3 ("selftests/bpf: Add benchmark for bpf_strncmp() helper")
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Viktor Malik <vmalik@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/bpf/20250313122852.1365202-1-vmalik@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit cbd8730aea8d79cda6b0f3c18b406dfdef0c1b80 ]
The verifier log when leaking resources on BPF_EXIT may be a bit
confusing, as it's a problem only when finally existing from the main
prog, not from any of the subprogs. Hence, update the verifier error
string and the corresponding selftests matching on it.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Suggested-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241204030400.208005-6-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 4d33dc1bc31df80356c49e40dbd3ddff19500bcb ]
With CONFIG_KPROBES_ON_FTRACE enabled on powerpc, ftrace_location_range
returns ftrace location for bpf_fentry_test1 at offset of 4 bytes from
function entry. This is because branch to _mcount function is at offset
of 4 bytes in function profile sequence.
To fix this, add entry_offset of 4 bytes while verifying the address for
kprobe entry address of bpf_fentry_test1 in verify_perf_link_info in
selftest, when CONFIG_KPROBES_ON_FTRACE is enabled.
Disassemble of bpf_fentry_test1:
c000000000e4b080 <bpf_fentry_test1>:
c000000000e4b080: a6 02 08 7c mflr r0
c000000000e4b084: b9 e2 22 4b bl c00000000007933c <_mcount>
c000000000e4b088: 01 00 63 38 addi r3,r3,1
c000000000e4b08c: b4 07 63 7c extsw r3,r3
c000000000e4b090: 20 00 80 4e blr
When CONFIG_PPC_FTRACE_OUT_OF_LINE [1] is enabled, these function profile
sequence is moved out of line with an unconditional branch at offset 0.
So, the test works without altering the offset for
'CONFIG_KPROBES_ON_FTRACE && CONFIG_PPC_FTRACE_OUT_OF_LINE' case.
Disassemble of bpf_fentry_test1:
c000000000f95190 <bpf_fentry_test1>:
c000000000f95190: 00 00 00 60 nop
c000000000f95194: 01 00 63 38 addi r3,r3,1
c000000000f95198: b4 07 63 7c extsw r3,r3
c000000000f9519c: 20 00 80 4e blr
[1] https://lore.kernel.org/all/20241030070850.1361304-13-hbathini@linux.ibm.com/
Fixes: 23cf7aa539dc ("selftests/bpf: Add selftest for fill_link_info")
Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241209065720.234344-1-skb99@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
Similarly to the previous test, we also need a test case to cover
positive offsets as well, TC is an excellent hook for this.
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Zijian Zhang <zijianzhang@bytedance.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20241213034057.246437-5-xiyou.wangcong@gmail.com
|
|
As requested by Daniel, we need to add a selftest to cover
bpf_skb_change_tail() cases in skb_verdict. Here we test trimming,
growing and error cases, and validate its expected return values and the
expected sizes of the payload.
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20241213034057.246437-3-xiyou.wangcong@gmail.com
|
|
Add tests to ensure that arguments are correctly marked based on their
specified positions, and whether they get marked correctly as maybe
null. For modules, all tracepoint parameters should be marked
PTR_MAYBE_NULL by default.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241213221929.3495062-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Arguments to a raw tracepoint are tagged as trusted, which carries the
semantics that the pointer will be non-NULL. However, in certain cases,
a raw tracepoint argument may end up being NULL. More context about this
issue is available in [0].
Thus, there is a discrepancy between the reality, that raw_tp arguments can
actually be NULL, and the verifier's knowledge, that they are never NULL,
causing explicit NULL check branch to be dead code eliminated.
A previous attempt [1], i.e. the second fixed commit, was made to
simulate symbolic execution as if in most accesses, the argument is a
non-NULL raw_tp, except for conditional jumps. This tried to suppress
branch prediction while preserving compatibility, but surfaced issues
with production programs that were difficult to solve without increasing
verifier complexity. A more complete discussion of issues and fixes is
available at [2].
Fix this by maintaining an explicit list of tracepoints where the
arguments are known to be NULL, and mark the positional arguments as
PTR_MAYBE_NULL. Additionally, capture the tracepoints where arguments
are known to be ERR_PTR, and mark these arguments as scalar values to
prevent potential dereference.
Each hex digit is used to encode NULL-ness (0x1) or ERR_PTR-ness (0x2),
shifted by the zero-indexed argument number x 4. This can be represented
as follows:
1st arg: 0x1
2nd arg: 0x10
3rd arg: 0x100
... and so on (likewise for ERR_PTR case).
In the future, an automated pass will be used to produce such a list, or
insert __nullable annotations automatically for tracepoints. Each
compilation unit will be analyzed and results will be collated to find
whether a tracepoint pointer is definitely not null, maybe null, or an
unknown state where verifier conservatively marks it PTR_MAYBE_NULL.
A proof of concept of this tool from Eduard is available at [3].
Note that in case we don't find a specification in the raw_tp_null_args
array and the tracepoint belongs to a kernel module, we will
conservatively mark the arguments as PTR_MAYBE_NULL. This is because
unlike for in-tree modules, out-of-tree module tracepoints may pass NULL
freely to the tracepoint. We don't protect against such tracepoints
passing ERR_PTR (which is uncommon anyway), lest we mark all such
arguments as SCALAR_VALUE.
While we are it, let's adjust the test raw_tp_null to not perform
dereference of the skb->mark, as that won't be allowed anymore, and make
it more robust by using inline assembly to test the dead code
elimination behavior, which should still stay the same.
[0]: https://lore.kernel.org/bpf/ZrCZS6nisraEqehw@jlelli-thinkpadt14gen4.remote.csb
[1]: https://lore.kernel.org/all/20241104171959.2938862-1-memxor@gmail.com
[2]: https://lore.kernel.org/bpf/20241206161053.809580-1-memxor@gmail.com
[3]: https://github.com/eddyz87/llvm-project/tree/nullness-for-tracepoint-params
Reported-by: Juri Lelli <juri.lelli@redhat.com> # original bug
Reported-by: Manu Bretelle <chantra@meta.com> # bugs in masking fix
Fixes: 3f00c5239344 ("bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs")
Fixes: cb4158ce8ec8 ("bpf: Mark raw_tp arguments with PTR_MAYBE_NULL")
Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Co-developed-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241213221929.3495062-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This patch reverts commit
cb4158ce8ec8 ("bpf: Mark raw_tp arguments with PTR_MAYBE_NULL"). The
patch was well-intended and meant to be as a stop-gap fixing branch
prediction when the pointer may actually be NULL at runtime. Eventually,
it was supposed to be replaced by an automated script or compiler pass
detecting possibly NULL arguments and marking them accordingly.
However, it caused two main issues observed for production programs and
failed to preserve backwards compatibility. First, programs relied on
the verifier not exploring == NULL branch when pointer is not NULL, thus
they started failing with a 'dereference of scalar' error. Next,
allowing raw_tp arguments to be modified surfaced the warning in the
verifier that warns against reg->off when PTR_MAYBE_NULL is set.
More information, context, and discusson on both problems is available
in [0]. Overall, this approach had several shortcomings, and the fixes
would further complicate the verifier's logic, and the entire masking
scheme would have to be removed eventually anyway.
Hence, revert the patch in preparation of a better fix avoiding these
issues to replace this commit.
[0]: https://lore.kernel.org/bpf/20241206161053.809580-1-memxor@gmail.com
Reported-by: Manu Bretelle <chantra@meta.com>
Fixes: cb4158ce8ec8 ("bpf: Mark raw_tp arguments with PTR_MAYBE_NULL")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241213221929.3495062-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Ensure that performing narrow ctx loads other than size == 8 are
rejected when the argument is a pointer type.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241212092050.3204165-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Robert Morris reported the following program type which passes the
verifier in [0]:
SEC("struct_ops/bpf_cubic_init")
void BPF_PROG(bpf_cubic_init, struct sock *sk)
{
asm volatile("r2 = *(u16*)(r1 + 0)"); // verifier should demand u64
asm volatile("*(u32 *)(r2 +1504) = 0"); // 1280 in some configs
}
The second line may or may not work, but the first instruction shouldn't
pass, as it's a narrow load into the context structure of the struct ops
callback. The code falls back to btf_ctx_access to ensure correctness
and obtaining the types of pointers. Ensure that the size of the access
is correctly checked to be 8 bytes, otherwise the verifier thinks the
narrow load obtained a trusted BTF pointer and will permit loads/stores
as it sees fit.
Perform the check on size after we've verified that the load is for a
pointer field, as for scalar values narrow loads are fine. Access to
structs passed as arguments to a BPF program are also treated as
scalars, therefore no adjustment is needed in their case.
Existing verifier selftests are broken by this change, but because they
were incorrect. Verifier tests for d_path were performing narrow load
into context to obtain path pointer, had this program actually run it
would cause a crash. The same holds for verifier_btf_ctx_access tests.
[0]: https://lore.kernel.org/bpf/51338.1732985814@localhost
Fixes: 9e15db66136a ("bpf: Implement accurate raw_tp context access via BTF")
Reported-by: Robert Morris <rtm@mit.edu>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241212092050.3204165-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Extend changes_pkt_data tests with test cases freplacing the main
program that does not have subprograms. Try four combinations when
both main program and replacement do and do not change packet data.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20241212070711.427443-2-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add a test case with a tail call done from a global sub-program. Such
tails calls should be considered as invalidating packet pointers.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20241210041100.1898468-9-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Tail-called programs could execute any of the helpers that invalidate
packet pointers. Hence, conservatively assume that each tail call
invalidates packet pointers.
Making the change in bpf_helper_changes_pkt_data() automatically makes
use of check_cfg() logic that computes 'changes_pkt_data' effect for
global sub-programs, such that the following program could be
rejected:
int tail_call(struct __sk_buff *sk)
{
bpf_tail_call_static(sk, &jmp_table, 0);
return 0;
}
SEC("tc")
int not_safe(struct __sk_buff *sk)
{
int *p = (void *)(long)sk->data;
... make p valid ...
tail_call(sk);
*p = 42; /* this is unsafe */
...
}
The tc_bpf2bpf.c:subprog_tc() needs change: mark it as a function that
can invalidate packet pointers. Otherwise, it can't be freplaced with
tailcall_freplace.c:entry_freplace() that does a tail call.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20241210041100.1898468-8-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Try different combinations of global functions replacement:
- replace function that changes packet data with one that doesn't;
- replace function that changes packet data with one that does;
- replace function that doesn't change packet data with one that does;
- replace function that doesn't change packet data with one that doesn't;
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20241210041100.1898468-7-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Check if verifier is aware of packet pointers invalidation done in
global functions. Based on a test shared by Nick Zavaritsky in [0].
[0] https://lore.kernel.org/bpf/0498CA22-5779-4767-9C0C-A9515CEA711F@gmail.com/
Suggested-by: Nick Zavaritsky <mejedi@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20241210041100.1898468-5-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add a test case to verify that without CAP_PERFMON, the test now
succeeds instead of failing due to a verification error.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241204044757.1483141-6-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Ensure that when CAP_PERFMON is dropped, and the verifier sees
allow_ptr_leaks as false, we are not permitted to read from a
STACK_INVALID slot. Without the fix, the test will report unexpected
success in loading.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241204044757.1483141-5-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add a __caps_unpriv annotation so that tests requiring specific
capabilities while dropping the rest can conveniently specify them
during selftest declaration instead of munging with capabilities at
runtime from the testing binary.
While at it, let us convert test_verifier_mtu to use this new support
instead.
Since we do not want to include linux/capability.h, we only defined the
four main capabilities BPF subsystem deals with in bpf_misc.h for use in
tests. If the user passes a CAP_SYS_NICE or anything else that's not
defined in the header, capability parsing code will return a warning.
Also reject strtol returning 0. CAP_CHOWN = 0 but we'll never need to
use it, and strtol doesn't errno on failed conversion. Fail the test in
such a case.
The original diff for this idea is available at link [0].
[0]: https://lore.kernel.org/bpf/a1e48f5d9ae133e19adc6adf27e19d585e06bab4.camel@gmail.com
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
[ Kartikeya: rebase on bpf-next, add warn to parse_caps, convert test_verifier_mtu ]
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241204044757.1483141-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Andrii spotted that process_dynptr_func's rejection of incorrect
argument register type will print an error string where argument numbers
are not zero-indexed, unlike elsewhere in the verifier. Fix this by
subtracting 1 from regno. The same scenario exists for iterator
messages. Fix selftest error strings that match on the exact argument
number while we're at it to ensure clean bisection.
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241203002235.3776418-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add selftests to cover argument type check for iterator kfuncs, and
cover all three kinds (new, next, destroy). Without the fix in the
previous patch, the selftest would not cause a verifier error.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241203000238.3602922-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Currently, KF_ARG_PTR_TO_ITER handling missed checking the reg->type and
ensuring it is PTR_TO_STACK. Instead of enforcing this in the caller of
process_iter_arg, move the check into it instead so that all callers
will gain the check by default. This is similar to process_dynptr_func.
An existing selftest in verifier_bits_iter.c fails due to this change,
but it's because it was passing a NULL pointer into iter_next helper and
getting an error further down the checks, but probably meant to pass an
uninitialized iterator on the stack (as is done in the subsequent test
below it). We will gain coverage for non-PTR_TO_STACK arguments in later
patches hence just change the declaration to zero-ed stack object.
Fixes: 06accc8779c1 ("bpf: add support for open-coded iterator loops")
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Tao Lyu <tao.lyu@epfl.ch>
[ Kartikeya: move check into process_iter_arg, rewrite commit log ]
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241203000238.3602922-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
CONFIG_PREEMPT is a preemtion model the so called "Low-Latency Desktop".
A different preemption model is PREEMPT_RT the so called "Real-Time".
Both implement preemption in kernel and set CONFIG_PREEMPTION.
There is also the so called "LAZY PREEMPT" which the "Scheduler
controlled preemption model". Here we have also preemption in the kernel
the rules are slightly different.
Therefore the testsuite should not check for CONFIG_PREEMPT (as one
model) but for CONFIG_PREEMPTION to figure out if preemption in the
kernel is possible.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/r/20241119161819.qvEcs-n_@linutronix.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni:
"The most significant set of changes is the per netns RTNL. The new
behavior is disabled by default, regression risk should be contained.
Notably the new config knob PTP_1588_CLOCK_VMCLOCK will inherit its
default value from PTP_1588_CLOCK_KVM, as the first is intended to be
a more reliable replacement for the latter.
Core:
- Started a very large, in-progress, effort to make the RTNL lock
scope per network-namespace, thus reducing the lock contention
significantly in the containerized use-case, comprising:
- RCU-ified some relevant slices of the FIB control path
- introduce basic per netns locking helpers
- namespacified the IPv4 address hash table
- remove rtnl_register{,_module}() in favour of
rtnl_register_many()
- refactor rtnl_{new,del,set}link() moving as much validation as
possible out of RTNL lock
- convert all phonet doit() and dumpit() handlers to RCU
- convert IPv4 addresses manipulation to per-netns RTNL
- convert virtual interface creation to per-netns RTNL
the per-netns lock infrastructure is guarded by the
CONFIG_DEBUG_NET_SMALL_RTNL knob, disabled by default ad interim.
- Introduce NAPI suspension, to efficiently switching between busy
polling (NAPI processing suspended) and normal processing.
- Migrate the IPv4 routing input, output and control path from direct
ToS usage to DSCP macros. This is a work in progress to make ECN
handling consistent and reliable.
- Add drop reasons support to the IPv4 rotue input path, allowing
better introspection in case of packets drop.
- Make FIB seqnum lockless, dropping RTNL protection for read access.
- Make inet{,v6} addresses hashing less predicable.
- Allow providing timestamp OPT_ID via cmsg, to correlate TX packets
and timestamps
Things we sprinkled into general kernel code:
- Add small file operations for debugfs, to reduce the struct ops
size.
- Refactoring and optimization for the implementation of page_frag
API, This is a preparatory work to consolidate the page_frag
implementation.
Netfilter:
- Optimize set element transactions to reduce memory consumption
- Extended netlink error reporting for attribute parser failure.
- Make legacy xtables configs user selectable, giving users the
option to configure iptables without enabling any other config.
- Address a lot of false-positive RCU issues, pointed by recent CI
improvements.
BPF:
- Put xsk sockets on a struct diet and add various cleanups. Overall,
this helps to bump performance by 12% for some workloads.
- Extend BPF selftests to increase coverage of XDP features in
combination with BPF cpumap.
- Optimize and homogenize bpf_csum_diff helper for all archs and also
add a batch of new BPF selftests for it.
- Extend netkit with an option to delegate skb->{mark,priority}
scrubbing to its BPF program.
- Make the bpf_get_netns_cookie() helper available also to tc(x) BPF
programs.
Protocols:
- Introduces 4-tuple hash for connected udp sockets, speeding-up
significantly connected sockets lookup.
- Add a fastpath for some TCP timers that usually expires after
close, the socket lock contention.
- Add inbound and outbound xfrm state caches to speed up state
lookups.
- Avoid sending MPTCP advertisements on stale subflows, reducing
risks on loosing them.
- Make neighbours table flushing more scalable, maintaining per
device neigh lists.
Driver API:
- Introduce a unified interface to configure transmission H/W
shaping, and expose it to user-space via generic-netlink.
- Add support for per-NAPI config via netlink. This makes napi
configuration persistent across queues removal and re-creation.
Requires driver updates, currently supported drivers are:
nVidia/Mellanox mlx4 and mlx5, Broadcom brcm and Intel ice.
- Add ethtool support for writing SFP / PHY firmware blocks.
- Track RSS context allocation from ethtool core.
- Implement support for mirroring to DSA CPU port, via TC mirror
offload.
- Consolidate FDB updates notification, to avoid duplicates on
device-specific entries.
- Expose DPLL clock quality level to the user-space.
- Support master-slave PHY config via device tree.
Tests and tooling:
- forwarding: introduce deferred commands, to simplify the cleanup
phase
Drivers:
- Updated several drivers - Amazon vNic, Google vNic, Microsoft vNic,
Intel e1000e and Broadcom Tigon3 - to use netdev-genl to link the
IRQs and queues to NAPI IDs, allowing busy polling and better
introspection.
- Ethernet high-speed NICs:
- nVidia/Mellanox:
- mlx5:
- a large refactor to implement support for cross E-Switch
scheduling
- refactor H/W conter management to let it scale better
- H/W GRO cleanups
- Intel (100G, ice)::
- add support for ethtool reset
- implement support for per TX queue H/W shaping
- AMD/Solarflare:
- implement per device queue stats support
- Broadcom (bnxt):
- improve wildcard l4proto on IPv4/IPv6 ntuple rules
- Marvell Octeon:
- Add representor support for each Resource Virtualization Unit
(RVU) device.
- Hisilicon:
- add support for the BMC Gigabit Ethernet
- IBM (EMAC):
- driver cleanup and modernization
- Cisco (VIC):
- raise the queues number limit to 256
- Ethernet virtual:
- Google vNIC:
- implement page pool support
- macsec:
- inherit lower device's features and TSO limits when
offloading
- virtio_net:
- enable premapped mode by default
- support for XDP socket(AF_XDP) zerocopy TX
- wireguard:
- set the TSO max size to be GSO_MAX_SIZE, to aggregate larger
packets.
- Ethernet NICs embedded and virtual:
- Broadcom ASP:
- enable software timestamping
- Freescale:
- add enetc4 PF driver
- MediaTek: Airoha SoC:
- implement BQL support
- RealTek r8169:
- enable TSO by default on r8168/r8125
- implement extended ethtool stats
- Renesas AVB:
- enable TX checksum offload
- Synopsys (stmmac):
- support header splitting for vlan tagged packets
- move common code for DWMAC4 and DWXGMAC into a separate FPE
module.
- add dwmac driver support for T-HEAD TH1520 SoC
- Synopsys (xpcs):
- driver refactor and cleanup
- TI:
- icssg_prueth: add VLAN offload support
- Xilinx emaclite:
- add clock support
- Ethernet switches:
- Microchip:
- implement support for the lan969x Ethernet switch family
- add LAN9646 switch support to KSZ DSA driver
- Ethernet PHYs:
- Marvel: 88q2x: enable auto negotiation
- Microchip: add support for LAN865X Rev B1 and LAN867X Rev C1/C2
- PTP:
- Add support for the Amazon virtual clock device
- Add PtP driver for s390 clocks
- WiFi:
- mac80211
- EHT 1024 aggregation size for transmissions
- new operation to indicate that a new interface is to be added
- support radio separation of multi-band devices
- move wireless extension spy implementation to libiw
- Broadcom:
- brcmfmac: optional LPO clock support
- Microchip:
- add support for Atmel WILC3000
- Qualcomm (ath12k):
- firmware coredump collection support
- add debugfs support for a multitude of statistics
- Qualcomm (ath5k):
- Arcadyan ARV45XX AR2417 & Gigaset SX76[23] AR241[34]A support
- Realtek:
- rtw88: 8821au and 8812au USB adapters support
- rtw89: add thermal protection
- rtw89: fine tune BT-coexsitence to improve user experience
- rtw89: firmware secure boot for WiFi 6 chip
- Bluetooth
- add Qualcomm WCN785x support for ids Foxconn 0xe0fc/0xe0f3 and
0x13d3:0x3623
- add Realtek RTL8852BE support for id Foxconn 0xe123
- add MediaTek MT7920 support for wireless module ids
- btintel_pcie: add handshake between driver and firmware
- btintel_pcie: add recovery mechanism
- btnxpuart: add GPIO support to power save feature"
* tag 'net-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1475 commits)
mm: page_frag: fix a compile error when kernel is not compiled
Documentation: tipc: fix formatting issue in tipc.rst
selftests: nic_performance: Add selftest for performance of NIC driver
selftests: nic_link_layer: Add selftest case for speed and duplex states
selftests: nic_link_layer: Add link layer selftest for NIC driver
bnxt_en: Add FW trace coredump segments to the coredump
bnxt_en: Add a new ethtool -W dump flag
bnxt_en: Add 2 parameters to bnxt_fill_coredump_seg_hdr()
bnxt_en: Add functions to copy host context memory
bnxt_en: Do not free FW log context memory
bnxt_en: Manage the FW trace context memory
bnxt_en: Allocate backing store memory for FW trace logs
bnxt_en: Add a 'force' parameter to bnxt_free_ctx_mem()
bnxt_en: Refactor bnxt_free_ctx_mem()
bnxt_en: Add mem_valid bit to struct bnxt_ctx_mem_type
bnxt_en: Update firmware interface spec to 1.10.3.85
selftests/bpf: Add some tests with sockmap SK_PASS
bpf: fix recursive lock when verdict program return SK_PASS
wireguard: device: support big tcp GSO
wireguard: selftests: load nf_conntrack if not present
...
|
|
llvm 19 fails to compile arena self test:
CLNG-BPF [test_progs] verifier_arena_large.bpf.o
progs/verifier_arena_large.c:90:24: error: unsupported signed division, please convert to unsigned div/mod.
90 | pg_idx = (pg - base) / PAGE_SIZE;
Though llvm <= 18 and llvm >= 20 don't have this issue,
fix the test to avoid the build error.
Reported-by: Jiri Olsa <olsajiri@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Cross-merge networking fixes after downstream PR (net-6.12-rc8).
Conflicts:
tools/testing/selftests/net/.gitignore
252e01e68241 ("selftests: net: add netlink-dumps to .gitignore")
be43a6b23829 ("selftests: ncdevmem: Move ncdevmem under drivers/net/hw")
https://lore.kernel.org/all/20241113122359.1b95180a@canb.auug.org.au/
drivers/net/phy/phylink.c
671154f174e0 ("net: phylink: ensure PHY momentary link-fails are handled")
7530ea26c810 ("net: phylink: remove "using_mac_select_pcs"")
Adjacent changes:
drivers/net/ethernet/stmicro/stmmac/dwmac-intel-plat.c
5b366eae7193 ("stmmac: dwmac-intel-plat: fix call balance of tx_clk handling routines")
e96321fad3ad ("net: ethernet: Switch back to struct platform_driver::remove()")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a test that verifies specific behavior of arena range tree
algorithm and adjust existing big_alloc1 test due to use
of global data in arena.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/bpf/20241108025616.17625-3-alexei.starovoitov@gmail.com
|
|
Cross-merge bpf fixes after downstream PR.
In particular to bring the fix in
commit aa30eb3260b2 ("bpf: Force checkpoint when jmp history is too long").
The follow up verifier work depends on it.
And the fix in
commit 6801cf7890f2 ("selftests/bpf: Use -4095 as the bad address for bits iterator").
It's fixing instability of BPF CI on s390 arch.
No conflicts.
Adjacent changes in:
Auto-merging arch/Kconfig
Auto-merging kernel/bpf/helpers.c
Auto-merging kernel/bpf/memalloc.c
Auto-merging kernel/bpf/verifier.c
Auto-merging mm/slab_common.c
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add three tests for struct_ops using private stack.
./test_progs -t struct_ops_private_stack
#336/1 struct_ops_private_stack/private_stack:OK
#336/2 struct_ops_private_stack/private_stack_fail:OK
#336/3 struct_ops_private_stack/private_stack_recur:OK
#336 struct_ops_private_stack:OK
The following is a snippet of a struct_ops check_member() implementation:
u32 moff = __btf_member_bit_offset(t, member) / 8;
switch (moff) {
case offsetof(struct bpf_testmod_ops3, test_1):
prog->aux->priv_stack_requested = true;
prog->aux->recursion_detected = test_1_recursion_detected;
fallthrough;
default:
break;
}
return 0;
The first test is with nested two different callback functions where the
first prog has more than 512 byte stack size (including subprogs) with
private stack enabled.
The second test is a negative test where the second prog has more than 512
byte stack size without private stack enabled.
The third test is the same callback function recursing itself. At run time,
the jit trampoline recursion check kicks in to prevent the recursion. The
recursion_detected() callback function is implemented by the bpf_testmod,
the following message in dmesg
bpf_testmod: oh no, recursing into test_1, recursion_misses 1
demonstrates the callback function is indeed triggered when recursion miss
happens.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20241112163938.2225528-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Some private stack tests are added including:
- main prog only with stack size greater than BPF_PSTACK_MIN_SIZE.
- main prog only with stack size smaller than BPF_PSTACK_MIN_SIZE.
- prog with one subprog having MAX_BPF_STACK stack size and another
subprog having non-zero small stack size.
- prog with callback function.
- prog with exception in main prog or subprog.
- prog with async callback without nesting
- prog with async callback with possible nesting
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20241112163927.2224750-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add test cases to verify the following four update operations on htab of
maps don't trigger lockdep warning:
(1) add then delete
(2) add, overwrite, then delete
(3) add, then lookup_and_delete
(4) add two elements, then lookup_and_delete_batch
Test cases are added for pre-allocated and non-preallocated htab of maps
respectively.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20241106063542.357743-4-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
|
|
Adding uprobe session consumers to the consumer test,
so we get the session into the test mix.
In addition scaling down the test to have just 1 uprobe
and 1 uretprobe, otherwise the test time grows and is
unsuitable for CI even with threads.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241108134544.480660-13-jolsa@kernel.org
|
|
Testing that the session ret_handler bypass works on single
uprobe with multiple consumers, each with different session
ignore return value.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241108134544.480660-12-jolsa@kernel.org
|
|
Making sure kprobe.session program can return only [0,1] values.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241108134544.480660-11-jolsa@kernel.org
|
|
Making sure uprobe.session program can return only [0,1] values.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241108134544.480660-10-jolsa@kernel.org
|
|
Adding uprobe session test that verifies the cookie value is stored
properly when single uprobe-ed function is executed recursively.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241108134544.480660-9-jolsa@kernel.org
|
|
Adding uprobe session test that verifies the cookie value
get properly propagated from entry to return program.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241108134544.480660-8-jolsa@kernel.org
|
|
Adding uprobe session test and testing that the entry program
return value controls execution of the return probe program.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241108134544.480660-7-jolsa@kernel.org
|
|
As reported by Byeonguk, the bad_words test in verifier_bits_iter.c
occasionally fails on s390 host. Quoting Ilya's explanation:
s390 kernel runs in a completely separate address space, there is no
user/kernel split at TASK_SIZE. The same address may be valid in both
the kernel and the user address spaces, there is no way to tell by
looking at it. The config option related to this property is
ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE.
Also, unfortunately, 0 is a valid address in the s390 kernel address
space.
Fix the issue by using -4095 as the bad address for bits iterator, as
suggested by Ilya. Verify that bpf_iter_bits_new() returns -EINVAL for
NULL address and -EFAULT for bad address.
Fixes: ebafc1e535db ("selftests/bpf: Add three test cases for bits_iter")
Reported-by: Byeonguk Jeong <jungbu2855@gmail.com>
Closes: https://lore.kernel.org/bpf/ZycSXwjH4UTvx-Cn@ub22/
Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20241105043057.3371482-1-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Ensure that trusted PTR_TO_BTF_ID accesses perform PROBE_MEM handling in
raw_tp program. Without the previous fix, this selftest crashes the
kernel due to a NULL-pointer dereference. Also ensure that dead code
elimination does not kick in for checks on the pointer.
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241104171959.2938862-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Arguments to a raw tracepoint are tagged as trusted, which carries the
semantics that the pointer will be non-NULL. However, in certain cases,
a raw tracepoint argument may end up being NULL. More context about this
issue is available in [0].
Thus, there is a discrepancy between the reality, that raw_tp arguments
can actually be NULL, and the verifier's knowledge, that they are never
NULL, causing explicit NULL checks to be deleted, and accesses to such
pointers potentially crashing the kernel.
To fix this, mark raw_tp arguments as PTR_MAYBE_NULL, and then special
case the dereference and pointer arithmetic to permit it, and allow
passing them into helpers/kfuncs; these exceptions are made for raw_tp
programs only. Ensure that we don't do this when ref_obj_id > 0, as in
that case this is an acquired object and doesn't need such adjustment.
The reason we do mask_raw_tp_trusted_reg logic is because other will
recheck in places whether the register is a trusted_reg, and then
consider our register as untrusted when detecting the presence of the
PTR_MAYBE_NULL flag.
To allow safe dereference, we enable PROBE_MEM marking when we see loads
into trusted pointers with PTR_MAYBE_NULL.
While trusted raw_tp arguments can also be passed into helpers or kfuncs
where such broken assumption may cause issues, a future patch set will
tackle their case separately, as PTR_TO_BTF_ID (without PTR_TRUSTED) can
already be passed into helpers and causes similar problems. Thus, they
are left alone for now.
It is possible that these checks also permit passing non-raw_tp args
that are trusted PTR_TO_BTF_ID with null marking. In such a case,
allowing dereference when pointer is NULL expands allowed behavior, so
won't regress existing programs, and the case of passing these into
helpers is the same as above and will be dealt with later.
Also update the failure case in tp_btf_nullable selftest to capture the
new behavior, as the verifier will no longer cause an error when
directly dereference a raw tracepoint argument marked as __nullable.
[0]: https://lore.kernel.org/bpf/ZrCZS6nisraEqehw@jlelli-thinkpadt14gen4.remote.csb
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Reported-by: Juri Lelli <juri.lelli@redhat.com>
Tested-by: Juri Lelli <juri.lelli@redhat.com>
Fixes: 3f00c5239344 ("bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241104171959.2938862-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add failure tests to ensure bugs don't slip through for tail calls and
lingering locks, RCU sections, preemption disabled sections, and
references prevent tail calls.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241103225940.1408302-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
There are similar checks for covering locks, references, RCU read
sections and preempt_disable sections in 3 places in the verifer, i.e.
for tail calls, bpf_ld_[abs, ind], and exit path (for BPF_EXIT and
bpf_throw). Unify all of these into a common check_resource_leak
function to avoid code duplication.
Also update the error strings in selftests to the new ones in the same
change to ensure clean bisection.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241103225940.1408302-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2024-10-31
We've added 13 non-merge commits during the last 16 day(s) which contain
a total of 16 files changed, 710 insertions(+), 668 deletions(-).
The main changes are:
1) Optimize and homogenize bpf_csum_diff helper for all archs and also
add a batch of new BPF selftests for it, from Puranjay Mohan.
2) Rewrite and migrate the test_tcp_check_syncookie.sh BPF selftest
into test_progs so that it can be run in BPF CI, from Alexis Lothoré.
3) Two BPF sockmap selftest fixes, from Zijian Zhang.
4) Small XDP synproxy BPF selftest cleanup to remove IP_DF check,
from Vincent Li.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
selftests/bpf: Add a selftest for bpf_csum_diff()
selftests/bpf: Don't mask result of bpf_csum_diff() in test_verifier
bpf: bpf_csum_diff: Optimize and homogenize for all archs
net: checksum: Move from32to16() to generic header
selftests/bpf: remove xdp_synproxy IP_DF check
selftests/bpf: remove test_tcp_check_syncookie
selftests/bpf: test MSS value returned with bpf_tcp_gen_syncookie
selftests/bpf: add ipv4 and dual ipv4/ipv6 support in btf_skc_cls_ingress
selftests/bpf: get rid of global vars in btf_skc_cls_ingress
selftests/bpf: add missing ns cleanups in btf_skc_cls_ingress
selftests/bpf: factorize conn and syncookies tests in a single runner
selftests/bpf: Fix txmsg_redir of test_txmsg_pull in test_sockmap
selftests/bpf: Fix msg_verify_data in test_sockmap
====================
Link: https://patch.msgid.link/20241031221543.108853-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The new subtest runs with bpf_prog_test_run_opts() as a syscall prog.
It iterates the kmem_cache using bpf_for_each loop and count the number
of entries. Finally it checks it with the number of entries from the
regular iterator.
$ ./vmtest.sh -- ./test_progs -t kmem_cache_iter
...
#130/1 kmem_cache_iter/check_task_struct:OK
#130/2 kmem_cache_iter/check_slabinfo:OK
#130/3 kmem_cache_iter/open_coded_iter:OK
#130 kmem_cache_iter:OK
Summary: 1/3 PASSED, 0 SKIPPED, 0 FAILED
Also simplify the code by using attach routine of the skeleton.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20241030222819.1800667-2-namhyung@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Cross-merge networking fixes after downstream PR (net-6.12-rc6).
Conflicts:
drivers/net/wireless/intel/iwlwifi/mvm/mld-mac80211.c
cbe84e9ad5e2 ("wifi: iwlwifi: mvm: really send iwl_txpower_constraints_cmd")
188a1bf89432 ("wifi: mac80211: re-order assigning channel in activate links")
https://lore.kernel.org/all/20241028123621.7bbb131b@canb.auug.org.au/
net/mac80211/cfg.c
c4382d5ca1af ("wifi: mac80211: update the right link for tx power")
8dd0498983ee ("wifi: mac80211: Fix setting txpower with emulate_chanctx")
drivers/net/ethernet/intel/ice/ice_ptp_hw.h
6e58c3310622 ("ice: fix crash on probe for DPLL enabled E810 LOM")
e4291b64e118 ("ice: Align E810T GPIO to other products")
ebb2693f8fbd ("ice: Read SDP section from NVM for pin definitions")
ac532f4f4251 ("ice: Cleanup unused declarations")
https://lore.kernel.org/all/20241030120524.1ee1af18@canb.auug.org.au/
No adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add more test cases for bits iterator:
(1) huge word test
Verify the multiplication overflow of nr_bits in bits_iter. Without
the overflow check, when nr_words is 67108865, nr_bits becomes 64,
causing bpf_probe_read_kernel_common() to corrupt the stack.
(2) max word test
Verify correct handling of maximum nr_words value (511).
(3) bad word test
Verify early termination of bits iteration when bits iterator
initialization fails.
Also rename bits_nomem to bits_too_big to better reflect its purpose.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20241030100516.3633640-6-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add a selftest for the bpf_csum_diff() helper. This selftests runs the
helper in all three configurations(push, pull, and diff) and verifies
its output. The correct results have been computed by hand and by the
helper's older implementation.
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20241026125339.26459-5-puranjay@kernel.org
|
|
The bpf_csum_diff() helper has been fixed to return a 16-bit value for
all archs, so now we don't need to mask the result.
This commit is basically reverting the below:
commit 6185266c5a85 ("selftests/bpf: Mask bpf_csum_diff() return value
to 16 bits in test_verifier")
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20241026125339.26459-4-puranjay@kernel.org
|
|
Hou Tao reported an issue with bpf_fastcall patterns allowing extra
stack space above MAX_BPF_STACK limit. This extra stack allowance is
not integrated properly with the following verifier parts:
- backtracking logic still assumes that stack can't exceed
MAX_BPF_STACK;
- bpf_verifier_env->scratched_stack_slots assumes only 64 slots are
available.
Here is an example of an issue with precision tracking
(note stack slot -8 tracked as precise instead of -520):
0: (b7) r1 = 42 ; R1_w=42
1: (b7) r2 = 42 ; R2_w=42
2: (7b) *(u64 *)(r10 -512) = r1 ; R1_w=42 R10=fp0 fp-512_w=42
3: (7b) *(u64 *)(r10 -520) = r2 ; R2_w=42 R10=fp0 fp-520_w=42
4: (85) call bpf_get_smp_processor_id#8 ; R0_w=scalar(...)
5: (79) r2 = *(u64 *)(r10 -520) ; R2_w=42 R10=fp0 fp-520_w=42
6: (79) r1 = *(u64 *)(r10 -512) ; R1_w=42 R10=fp0 fp-512_w=42
7: (bf) r3 = r10 ; R3_w=fp0 R10=fp0
8: (0f) r3 += r2
mark_precise: frame0: last_idx 8 first_idx 0 subseq_idx -1
mark_precise: frame0: regs=r2 stack= before 7: (bf) r3 = r10
mark_precise: frame0: regs=r2 stack= before 6: (79) r1 = *(u64 *)(r10 -512)
mark_precise: frame0: regs=r2 stack= before 5: (79) r2 = *(u64 *)(r10 -520)
mark_precise: frame0: regs= stack=-8 before 4: (85) call bpf_get_smp_processor_id#8
mark_precise: frame0: regs= stack=-8 before 3: (7b) *(u64 *)(r10 -520) = r2
mark_precise: frame0: regs=r2 stack= before 2: (7b) *(u64 *)(r10 -512) = r1
mark_precise: frame0: regs=r2 stack= before 1: (b7) r2 = 42
9: R2_w=42 R3_w=fp42
9: (95) exit
This patch disables the additional allowance for the moment.
Also, two test cases are removed:
- bpf_fastcall_max_stack_ok:
it fails w/o additional stack allowance;
- bpf_fastcall_max_stack_fail:
this test is no longer necessary, stack size follows
regular rules, pattern invalidation is checked by other
test cases.
Reported-by: Hou Tao <houtao@huaweicloud.com>
Closes: https://lore.kernel.org/bpf/20241023022752.172005-1-houtao@huaweicloud.com/
Fixes: 5b5f51bff1b6 ("bpf: no_caller_saved_registers attribute for helper calls")
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20241029193911.1575719-1-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Drop bpf_iter.h header which uses vmlinux.h but re-defines a bunch of
iterator structures and some of BPF constants for use in BPF iterator
selftests.
None of that is necessary when fresh vmlinux.h header is generated for
vmlinux image that matches latest selftests. So drop ugly hacks and have
a nice plain vmlinux.h usage everywhere.
We could do the same with all the kfunc __ksym redefinitions, but that
has dependency on very fresh pahole, so I'm not addressing that here.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20241029203919.1948941-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|