summaryrefslogtreecommitdiff
path: root/kernel/bpf/verifier.c
AgeCommit message (Collapse)AuthorFilesLines
2023-03-17kallsyms, bpf: Move find_kallsyms_symbol_value out of internal headerViktor Malik1-1/+1
Moving find_kallsyms_symbol_value from kernel/module/internal.h to include/linux/module.h. The reason is that internal.h is not prepared to be included when CONFIG_MODULES=n. find_kallsyms_symbol_value is used by kernel/bpf/verifier.c and including internal.h from it (without modules) leads into a compilation error: In file included from ../include/linux/container_of.h:5, from ../include/linux/list.h:5, from ../include/linux/timer.h:5, from ../include/linux/workqueue.h:9, from ../include/linux/bpf.h:10, from ../include/linux/bpf-cgroup.h:5, from ../kernel/bpf/verifier.c:7: ../kernel/bpf/../module/internal.h: In function 'mod_find': ../include/linux/container_of.h:20:54: error: invalid use of undefined type 'struct module' 20 | static_assert(__same_type(*(ptr), ((type *)0)->member) || \ | ^~ [...] This patch fixes the above error. Fixes: 31bf1dbccfb0 ("bpf: Fix attaching fentry/fexit/fmod_ret/lsm to modules") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Viktor Malik <vmalik@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/oe-kbuild-all/202303161404.OrmfCy09-lkp@intel.com/ Link: https://lore.kernel.org/bpf/20230317095601.386738-1-vmalik@redhat.com
2023-03-17bpf: Remove misleading spec_v1 check on var-offset stack readLuis Gerhorst1-10/+6
For every BPF_ADD/SUB involving a pointer, adjust_ptr_min_max_vals() ensures that the resulting pointer has a constant offset if bypass_spec_v1 is false. This is ensured by calling sanitize_check_bounds() which in turn calls check_stack_access_for_ptr_arithmetic(). There, -EACCESS is returned if the register's offset is not constant, thereby rejecting the program. In summary, an unprivileged user must never be able to create stack pointers with a variable offset. That is also the case, because a respective check in check_stack_write() is missing. If they were able to create a variable-offset pointer, users could still use it in a stack-write operation to trigger unsafe speculative behavior [1]. Because unprivileged users must already be prevented from creating variable-offset stack pointers, viable options are to either remove this check (replacing it with a clarifying comment), or to turn it into a "verifier BUG"-message, also adding a similar check in check_stack_write() (for consistency, as a second-level defense). This patch implements the first option to reduce verifier bloat. This check was introduced by commit 01f810ace9ed ("bpf: Allow variable-offset stack access") which correctly notes that "variable-offset reads and writes are disallowed (they were already disallowed for the indirect access case) because the speculative execution checking code doesn't support them". However, it does not further discuss why the check in check_stack_read() is necessary. The code which made this check obsolete was also introduced in this commit. I have compiled ~650 programs from the Linux selftests, Linux samples, Cilium, and libbpf/examples projects and confirmed that none of these trigger the check in check_stack_read() [2]. Instead, all of these programs are, as expected, already rejected when constructing the variable-offset pointers. Note that the check in check_stack_access_for_ptr_arithmetic() also prints "off=%d" while the code removed by this patch does not (the error removed does not appear in the "verification_error" values). For reproducibility, the repository linked includes the raw data and scripts used to create the plot. [1] https://arxiv.org/pdf/1807.03757.pdf [2] https://gitlab.cs.fau.de/un65esoq/bpf-spectre/-/raw/53dc19fcf459c186613b1156a81504b39c8d49db/data/plots/23-02-26_23-56_bpftool/bpftool/0004-errors.pdf?inline=false Fixes: 01f810ace9ed ("bpf: Allow variable-offset stack access") Signed-off-by: Luis Gerhorst <gerhorst@cs.fau.de> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230315165358.23701-1-gerhorst@cs.fau.de
2023-03-16bpf: Mark struct bpf_cpumask as rcu protectedDavid Vernet1-0/+1
struct bpf_cpumask is a BPF-wrapper around the struct cpumask type which can be instantiated by a BPF program, and then queried as a cpumask in similar fashion to normal kernel code. The previous patch in this series makes the type fully RCU safe, so the type can be included in the rcu_protected_type BTF ID list. A subsequent patch will remove bpf_cpumask_kptr_get(), as it's no longer useful now that we can just treat the type as RCU safe by default and do our own if check. Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20230316054028.88924-3-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-16bpf: Fix attaching fentry/fexit/fmod_ret/lsm to modulesViktor Malik1-1/+17
This resolves two problems with attachment of fentry/fexit/fmod_ret/lsm to functions located in modules: 1. The verifier tries to find the address to attach to in kallsyms. This is always done by searching the entire kallsyms, not respecting the module in which the function is located. Such approach causes an incorrect attachment address to be computed if the function to attach to is shadowed by a function of the same name located earlier in kallsyms. 2. If the address to attach to is located in a module, the module reference is only acquired in register_fentry. If the module is unloaded between the place where the address is found (bpf_check_attach_target in the verifier) and register_fentry, it is possible that another module is loaded to the same address which may lead to potential errors. Since the attachment must contain the BTF of the program to attach to, we extract the module from it and search for the function address in the correct module (resolving problem no. 1). Then, the module reference is taken directly in bpf_check_attach_target and stored in the bpf program (in bpf_prog_aux). The reference is only released when the program is unloaded (resolving problem no. 2). Signed-off-by: Viktor Malik <vmalik@redhat.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/3f6a9d8ae850532b5ef864ef16327b0f7a669063.1678432753.git.vmalik@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-14bpf: Allow helpers access trusted PTR_TO_BTF_ID.Alexei Starovoitov1-0/+15
The verifier rejects the code: bpf_strncmp(task->comm, 16, "my_task"); with the message: 16: (85) call bpf_strncmp#182 R1 type=trusted_ptr_ expected=fp, pkt, pkt_meta, map_key, map_value, mem, ringbuf_mem, buf Teach the verifier that such access pattern is safe. Do not allow untrusted and legacy ptr_to_btf_id to be passed into helpers. Reported-by: David Vernet <void@manifault.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20230313235845.61029-3-alexei.starovoitov@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-03-13bpf: fix precision propagation verbose loggingAndrii Nakryiko1-2/+2
Fix wrong order of frame index vs register/slot index in precision propagation verbose (level 2) output. It's wrong and very confusing as is. Fixes: 529409ea92d5 ("bpf: propagate precision across all frames, not just the last one") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230313184017.4083374-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-11bpf: Allow local kptrs to be exchanged via bpf_kptr_xchgDave Marchevsky1-1/+7
The previous patch added necessary plumbing for verifier and runtime to know what to do with non-kernel PTR_TO_BTF_IDs in map values, but didn't provide any way to get such local kptrs into a map value. This patch modifies verifier handling of bpf_kptr_xchg to allow MEM_ALLOC kptr types. check_reg_type is modified accept MEM_ALLOC-flagged input to bpf_kptr_xchg despite such types not being in btf_ptr_types. This could have been done with a MAYBE_MEM_ALLOC equivalent to MAYBE_NULL, but bpf_kptr_xchg is the only helper that I can forsee using MAYBE_MEM_ALLOC, so keep it special-cased for now. The verifier tags bpf_kptr_xchg retval MEM_ALLOC if and only if the BTF associated with the retval is not kernel BTF. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20230310230743.2320707-3-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-10bpf: verifier: Rename kernel_type_name helper to btf_type_nameDave Marchevsky1-8/+8
kernel_type_name was introduced in commit 9e15db66136a ("bpf: Implement accurate raw_tp context access via BTF") with type signature: const char *kernel_type_name(u32 id) At that time the function used global btf_vmlinux BTF for all id lookups. Later, in commit 22dc4a0f5ed1 ("bpf: Remove hard-coded btf_vmlinux assumption from BPF verifier"), the type signature was changed to: static const char *kernel_type_name(const struct btf* btf, u32 id) With the btf parameter used for lookups instead of global btf_vmlinux. The helper will function as expected for type name lookup using non-kernel BTFs, and will be used for such in further patches in the series. Let's rename it to avoid incorrect assumptions that might arise when seeing the current name. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20230309180111.1618459-2-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-10bpf: take into account liveness when propagating precisionAndrii Nakryiko1-2/+4
When doing state comparison, if old state has register that is not marked as REG_LIVE_READ, then we just skip comparison, regardless what's the state of corresponing register in current state. This is because not REG_LIVE_READ register is irrelevant for further program execution and correctness. All good here. But when we get to precision propagation, after two states were declared equivalent, we don't take into account old register's liveness, and thus attempt to propagate precision for register in current state even if that register in old state was not REG_LIVE_READ anymore. This is bad, because register in current state could be anything at all and this could cause -EFAULT due to internal logic bugs. Fix by taking into account REG_LIVE_READ liveness mark to keep the logic in state comparison in sync with precision propagation. Fixes: a3ce685dd01a ("bpf: fix precision tracking") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230309224131.57449-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-10bpf: ensure state checkpointing at iter_next() call sitesAndrii Nakryiko1-3/+28
State equivalence check and checkpointing performed in is_state_visited() employs certain heuristics to try to save memory by avoiding state checkpoints if not enough jumps and instructions happened since last checkpoint. This leads to unpredictability of whether a particular instruction will be checkpointed and how regularly. While normally this is not causing much problems (except inconveniences for predictable verifier tests, which we overcome with BPF_F_TEST_STATE_FREQ flag), turns out it's not the case for open-coded iterators. Checking and saving state checkpoints at iter_next() call is crucial for fast convergence of open-coded iterator loop logic, so we need to force it. If we don't do that, is_state_visited() might skip saving a checkpoint, causing unnecessarily long sequence of not checkpointed instructions and jumps, leading to exhaustion of jump history buffer, and potentially other undesired outcomes. It is expected that with correct open-coded iterators convergence will happen quickly, so we don't run a risk of exhausting memory. This patch adds, in addition to prune and jump instruction marks, also a "forced checkpoint" mark, and makes sure that any iter_next() call instruction is marked as such. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230310060149.625887-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-09bpf: add support for open-coded iterator loopsAndrii Nakryiko1-8/+587
Teach verifier about the concept of the open-coded (or inline) iterators. This patch adds generic iterator loop verification logic, new STACK_ITER stack slot type to contain iterator state, and necessary kfunc plumbing for iterator's constructor, destructor and next methods. Next patch implements first specific iterator (numbers iterator for implementing for() loop logic). Such split allows to have more focused commits for verifier logic and separate commit that we could point later to demonstrating what does it take to add a new kind of iterator. Each kind of iterator has its own associated struct bpf_iter_<type>, where <type> denotes a specific type of iterator. struct bpf_iter_<type> state is supposed to live on BPF program stack, so there will be no way to change its size later on without breaking backwards compatibility, so choose wisely! But given this struct is specific to a given <type> of iterator, this allows a lot of flexibility: simple iterators could be fine with just one stack slot (8 bytes), like numbers iterator in the next patch, while some other more complicated iterators might need way more to keep their iterator state. Either way, such design allows to avoid runtime memory allocations, which otherwise would be necessary if we fixed on-the-stack size and it turned out to be too small for a given iterator implementation. The way BPF verifier logic is implemented, there are no artificial restrictions on a number of active iterators, it should work correctly using multiple active iterators at the same time. This also means you can have multiple nested iteration loops. struct bpf_iter_<type> reference can be safely passed to subprograms as well. General flow is easiest to demonstrate with a simple example using number iterator implemented in next patch. Here's the simplest possible loop: struct bpf_iter_num it; int *v; bpf_iter_num_new(&it, 2, 5); while ((v = bpf_iter_num_next(&it))) { bpf_printk("X = %d", *v); } bpf_iter_num_destroy(&it); Above snippet should output "X = 2", "X = 3", "X = 4". Note that 5 is exclusive and is not returned. This matches similar APIs (e.g., slices in Go or Rust) that implement a range of elements, where end index is non-inclusive. In the above example, we see a trio of function: - constructor, bpf_iter_num_new(), which initializes iterator state (struct bpf_iter_num it) on the stack. If any of the input arguments are invalid, constructor should make sure to still initialize it such that subsequent bpf_iter_num_next() calls will return NULL. I.e., on error, return error and construct empty iterator. - next method, bpf_iter_num_next(), which accepts pointer to iterator state and produces an element. Next method should always return a pointer. The contract between BPF verifier is that next method will always eventually return NULL when elements are exhausted. Once NULL is returned, subsequent next calls should keep returning NULL. In the case of numbers iterator, bpf_iter_num_next() returns a pointer to an int (storage for this integer is inside the iterator state itself), which can be dereferenced after corresponding NULL check. - once done with the iterator, it's mandated that user cleans up its state with the call to destructor, bpf_iter_num_destroy() in this case. Destructor frees up any resources and marks stack space used by struct bpf_iter_num as usable for something else. Any other iterator implementation will have to implement at least these three methods. It is enforced that for any given type of iterator only applicable constructor/destructor/next are callable. I.e., verifier ensures you can't pass number iterator state into, say, cgroup iterator's next method. It is important to keep the naming pattern consistent to be able to create generic macros to help with BPF iter usability. E.g., one of the follow up patches adds generic bpf_for_each() macro to bpf_misc.h in selftests, which allows to utilize iterator "trio" nicely without having to code the above somewhat tedious loop explicitly every time. This is enforced at kfunc registration point by one of the previous patches in this series. At the implementation level, iterator state tracking for verification purposes is very similar to dynptr. We add STACK_ITER stack slot type, reserve necessary number of slots, depending on sizeof(struct bpf_iter_<type>), and keep track of necessary extra state in the "main" slot, which is marked with non-zero ref_obj_id. Other slots are also marked as STACK_ITER, but have zero ref_obj_id. This is simpler than having a separate "is_first_slot" flag. Another big distinction is that STACK_ITER is *always refcounted*, which simplifies implementation without sacrificing usability. So no need for extra "iter_id", no need to anticipate reuse of STACK_ITER slots for new constructors, etc. Keeping it simple here. As far as the verification logic goes, there are two extensive comments: in process_iter_next_call() and iter_active_depths_differ() explaining some important and sometimes subtle aspects. Please refer to them for details. But from 10,000-foot point of view, next methods are the points of forking a verification state, which are conceptually similar to what verifier is doing when validating conditional jump. We branch out at a `call bpf_iter_<type>_next` instruction and simulate two outcomes: NULL (iteration is done) and non-NULL (new element is returned). NULL is simulated first and is supposed to reach exit without looping. After that non-NULL case is validated and it either reaches exit (for trivial examples with no real loop), or reaches another `call bpf_iter_<type>_next` instruction with the state equivalent to already (partially) validated one. State equivalency at that point means we technically are going to be looping forever without "breaking out" out of established "state envelope" (i.e., subsequent iterations don't add any new knowledge or constraints to the verifier state, so running 1, 2, 10, or a million of them doesn't matter). But taking into account the contract stating that iterator next method *has to* return NULL eventually, we can conclude that loop body is safe and will eventually terminate. Given we validated logic outside of the loop (NULL case), and concluded that loop body is safe (though potentially looping many times), verifier can claim safety of the overall program logic. The rest of the patch is necessary plumbing for state tracking, marking, validation, and necessary further kfunc plumbing to allow implementing iterator constructor, destructor, and next methods. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230308184121.1165081-4-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-09bpf: factor out fetching basic kfunc metadataAndrii Nakryiko1-33/+59
Factor out logic to fetch basic kfunc metadata based on struct bpf_insn. This is not exactly short or trivial code to just copy/paste and this information is sometimes necessary in other parts of the verifier logic. Subsequent patches will rely on this to determine if an instruction is a kfunc call to iterator next method. No functional changes intended, including that verbose() warning behavior when kfunc is not allowed for a particular program type. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230308184121.1165081-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-04bpf: add support for fixed-size memory pointer returns for kfuncsAndrii Nakryiko1-0/+8
Support direct fixed-size (and for now, read-only) memory access when kfunc's return type is a pointer to non-struct type. Calculate type size and let BPF program access that many bytes directly. This is crucial for numbers iterator. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230302235015.2044271-13-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-04bpf: generalize dynptr_get_spi to be usable for itersAndrii Nakryiko1-6/+12
Generalize the logic of fetching special stack slot object state using spi (stack slot index). This will be used by STACK_ITER logic next. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230302235015.2044271-12-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-04bpf: mark PTR_TO_MEM as non-null register typeAndrii Nakryiko1-1/+2
PTR_TO_MEM register without PTR_MAYBE_NULL is indeed non-null. This is important for BPF verifier to be able to prune guaranteed not to be taken branches. This is always the case with open-coded iterators. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230302235015.2044271-11-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-04bpf: move kfunc_call_arg_meta higher in the fileAndrii Nakryiko1-35/+35
Move struct bpf_kfunc_call_arg_meta higher in the file and put it next to struct bpf_call_arg_meta, so it can be used from more functions. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230302235015.2044271-10-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-04bpf: ensure that r0 is marked scratched after any function callAndrii Nakryiko1-0/+2
r0 is important (unless called function is void-returning, but that's taken care of by print_verifier_state() anyways) in verifier logs. Currently for helpers we seem to print it in verifier log, but for kfuncs we don't. Instead of figuring out where in the maze of code we accidentally set r0 as scratched for helpers and why we don't do that for kfuncs, just enforce that after any function call r0 is marked as scratched. Also, perhaps, we should reconsider "scratched" terminology, as it's mightily confusing. "Touched" would seem more appropriate. But I left that for follow ups for now. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230302235015.2044271-9-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-04bpf: fix visit_insn()'s detection of BPF_FUNC_timer_set_callback helperAndrii Nakryiko1-1/+1
It's not correct to assume that any BPF_CALL instruction is a helper call. Fix visit_insn()'s detection of bpf_timer_set_callback() helper by also checking insn->code == 0. For kfuncs insn->code would be set to BPF_PSEUDO_KFUNC_CALL, and for subprog calls it will be BPF_PSEUDO_CALL. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230302235015.2044271-8-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-04bpf: clean up visit_insn()'s instruction processingAndrii Nakryiko1-13/+12
Instead of referencing processed instruction repeatedly as insns[t] throughout entire visit_insn() function, take a local insn pointer and work with it in a cleaner way. It makes enhancing this function further a bit easier as well. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230302235015.2044271-7-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-04bpf: honor env->test_state_freq flag in is_state_visited()Andrii Nakryiko1-1/+2
env->test_state_freq flag can be set by user by passing BPF_F_TEST_STATE_FREQ program flag. This is used in a bunch of selftests to have predictable state checkpoints at every jump and so on. Currently, bounded loop handling heuristic ignores this flag if number of processed jumps and/or number of processed instructions is below some thresholds, which throws off that reliable state checkpointing. Honor this flag in all circumstances by disabling heuristic if env->test_state_freq is set. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230302235015.2044271-5-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-04bpf: improve regsafe() checks for PTR_TO_{MEM,BUF,TP_BUFFER}Andrii Nakryiko1-1/+5
Teach regsafe() logic to handle PTR_TO_MEM, PTR_TO_BUF, and PTR_TO_TP_BUFFER similarly to PTR_TO_MAP_{KEY,VALUE}. That is, instead of exact match for var_off and range, use tnum_in() and range_within() checks, allowing more general verified state to subsume more specific current state. This allows to match wider range of valid and safe states, speeding up verification and detecting wider range of equivalent states for upcoming open-coded iteration looping logic. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230302235015.2044271-3-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-04bpf: improve stack slot state printingAndrii Nakryiko1-26/+49
Improve stack slot state printing to provide more useful and relevant information, especially for dynptrs. While previously we'd see something like: 8: (85) call bpf_ringbuf_reserve_dynptr#198 ; R0_w=scalar() fp-8_w=dddddddd fp-16_w=dddddddd refs=2 Now we'll see way more useful: 8: (85) call bpf_ringbuf_reserve_dynptr#198 ; R0_w=scalar() fp-16_w=dynptr_ringbuf(ref_id=2) refs=2 I experimented with printing the range of slots taken by dynptr, something like: fp-16..8_w=dynptr_ringbuf(ref_id=2) But it felt very awkward and pretty useless. So we print the lowest address (most negative offset) only. The general structure of this code is now also set up for easier extension and will accommodate ITER slots naturally. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230302235015.2044271-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-04bpf: allow ctx writes using BPF_ST_MEM instructionEduard Zingerman1-58/+52
Lift verifier restriction to use BPF_ST_MEM instructions to write to context data structures. This requires the following changes: - verifier.c:do_check() for BPF_ST updated to: - no longer forbid writes to registers of type PTR_TO_CTX; - track dst_reg type in the env->insn_aux_data[...].ptr_type field (same way it is done for BPF_STX and BPF_LDX instructions). - verifier.c:convert_ctx_access() and various callbacks invoked by it are updated to handled BPF_ST instruction alongside BPF_STX. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230304011247.566040-2-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-03bpf: Refactor RCU enforcement in the verifier.Alexei Starovoitov1-57/+121
bpf_rcu_read_lock/unlock() are only available in clang compiled kernels. Lack of such key mechanism makes it impossible for sleepable bpf programs to use RCU pointers. Allow bpf_rcu_read_lock/unlock() in GCC compiled kernels (though GCC doesn't support btf_type_tag yet) and allowlist certain field dereferences in important data structures like tast_struct, cgroup, socket that are used by sleepable programs either as RCU pointer or full trusted pointer (which is valid outside of RCU CS). Use BTF_TYPE_SAFE_RCU and BTF_TYPE_SAFE_TRUSTED macros for such tagging. They will be removed once GCC supports btf_type_tag. With that refactor check_ptr_to_btf_access(). Make it strict in enforcing PTR_TRUSTED and PTR_UNTRUSTED while deprecating old PTR_TO_BTF_ID without modifier flags. There is a chance that this strict enforcement might break existing programs (especially on GCC compiled kernels), but this cleanup has to start sooner than later. Note PTR_TO_CTX access still yields old deprecated PTR_TO_BTF_ID. Once it's converted to strict PTR_TRUSTED or PTR_UNTRUSTED the kfuncs and helpers will be able to default to KF_TRUSTED_ARGS. KF_RCU will remain as a weaker version of KF_TRUSTED_ARGS where obj refcnt could be 0. Adjust rcu_read_lock selftest to run on gcc and clang compiled kernels. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/bpf/20230303041446.3630-7-alexei.starovoitov@gmail.com
2023-03-03bpf: Introduce kptr_rcu.Alexei Starovoitov1-9/+46
The life time of certain kernel structures like 'struct cgroup' is protected by RCU. Hence it's safe to dereference them directly from __kptr tagged pointers in bpf maps. The resulting pointer is MEM_RCU and can be passed to kfuncs that expect KF_RCU. Derefrence of other kptr-s returns PTR_UNTRUSTED. For example: struct map_value { struct cgroup __kptr *cgrp; }; SEC("tp_btf/cgroup_mkdir") int BPF_PROG(test_cgrp_get_ancestors, struct cgroup *cgrp_arg, const char *path) { struct cgroup *cg, *cg2; cg = bpf_cgroup_acquire(cgrp_arg); // cg is PTR_TRUSTED and ref_obj_id > 0 bpf_kptr_xchg(&v->cgrp, cg); cg2 = v->cgrp; // This is new feature introduced by this patch. // cg2 is PTR_MAYBE_NULL | MEM_RCU. // When cg2 != NULL, it's a valid cgroup, but its percpu_ref could be zero if (cg2) bpf_cgroup_ancestor(cg2, level); // safe to do. } Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/bpf/20230303041446.3630-4-alexei.starovoitov@gmail.com
2023-03-03bpf: Mark cgroups and dfl_cgrp fields as trusted.Alexei Starovoitov1-0/+6
bpf programs sometimes do: bpf_cgrp_storage_get(&map, task->cgroups->dfl_cgrp, ...); It is safe to do, because cgroups->dfl_cgrp pointer is set diring init and never changes. The task->cgroups is also never NULL. It is also set during init and will change when task switches cgroups. For any trusted task pointer dereference of cgroups and dfl_cgrp should yield trusted pointers. The verifier wasn't aware of this. Hence in gcc compiled kernels task->cgroups dereference was producing PTR_TO_BTF_ID without modifiers while in clang compiled kernels the verifier recognizes __rcu tag in cgroups field and produces PTR_TO_BTF_ID | MEM_RCU | MAYBE_NULL. Tag cgroups and dfl_cgrp as trusted to equalize clang and gcc behavior. When GCC supports btf_type_tag such tagging will done directly in the type. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: David Vernet <void@manifault.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/bpf/20230303041446.3630-3-alexei.starovoitov@gmail.com
2023-03-01bpf: Support kptrs in local storage mapsKumar Kartikeya Dwivedi1-4/+8
Enable support for kptrs in local storage maps by wiring up the freeing of these kptrs from map value. Freeing of bpf_local_storage_map is only delayed in case there are special fields, therefore bpf_selem_free_* path can also only dereference smap safely in that case. This is recorded using a bool utilizing a hole in bpF_local_storage_elem. It could have been tagged in the pointer value smap using the lowest bit (since alignment > 1), but since there was already a hole I went with the simpler option. Only the map structure freeing is delayed using RCU barriers, as the buckets aren't used when selem is being freed, so they can be freed once all readers of the bucket lists can no longer access it. Cc: Martin KaFai Lau <martin.lau@kernel.org> Cc: KP Singh <kpsingh@kernel.org> Cc: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230225154010.391965-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-01bpf: Add bpf_dynptr_slice and bpf_dynptr_slice_rdwrJoanne Koong1-4/+123
Two new kfuncs are added, bpf_dynptr_slice and bpf_dynptr_slice_rdwr. The user must pass in a buffer to store the contents of the data slice if a direct pointer to the data cannot be obtained. For skb and xdp type dynptrs, these two APIs are the only way to obtain a data slice. However, for other types of dynptrs, there is no difference between bpf_dynptr_slice(_rdwr) and bpf_dynptr_data. For skb type dynptrs, the data is copied into the user provided buffer if any of the data is not in the linear portion of the skb. For xdp type dynptrs, the data is copied into the user provided buffer if the data is between xdp frags. If the skb is cloned and a call to bpf_dynptr_data_rdwr is made, then the skb will be uncloned (see bpf_unclone_prologue()). Please note that any bpf_dynptr_write() automatically invalidates any prior data slices of the skb dynptr. This is because the skb may be cloned or may need to pull its paged buffer into the head. As such, any bpf_dynptr_write() will automatically have its prior data slices invalidated, even if the write is to data in the skb head of an uncloned skb. Please note as well that any other helper calls that change the underlying packet buffer (eg bpf_skb_pull_data()) invalidates any data slices of the skb dynptr as well, for the same reasons. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://lore.kernel.org/r/20230301154953.641654-10-joannelkoong@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-01bpf: Add xdp dynptrsJoanne Koong1-0/+10
Add xdp dynptrs, which are dynptrs whose underlying pointer points to a xdp_buff. The dynptr acts on xdp data. xdp dynptrs have two main benefits. One is that they allow operations on sizes that are not statically known at compile-time (eg variable-sized accesses). Another is that parsing the packet data through dynptrs (instead of through direct access of xdp->data and xdp->data_end) can be more ergonomic and less brittle (eg does not need manual if checking for being within bounds of data_end). For reads and writes on the dynptr, this includes reading/writing from/to and across fragments. Data slices through the bpf_dynptr_data API are not supported; instead bpf_dynptr_slice() and bpf_dynptr_slice_rdwr() should be used. For examples of how xdp dynptrs can be used, please see the attached selftests. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://lore.kernel.org/r/20230301154953.641654-9-joannelkoong@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-01bpf: Add skb dynptrsJoanne Koong1-0/+61
Add skb dynptrs, which are dynptrs whose underlying pointer points to a skb. The dynptr acts on skb data. skb dynptrs have two main benefits. One is that they allow operations on sizes that are not statically known at compile-time (eg variable-sized accesses). Another is that parsing the packet data through dynptrs (instead of through direct access of skb->data and skb->data_end) can be more ergonomic and less brittle (eg does not need manual if checking for being within bounds of data_end). For bpf prog types that don't support writes on skb data, the dynptr is read-only (bpf_dynptr_write() will return an error) For reads and writes through the bpf_dynptr_read() and bpf_dynptr_write() interfaces, reading and writing from/to data in the head as well as from/to non-linear paged buffers is supported. Data slices through the bpf_dynptr_data API are not supported; instead bpf_dynptr_slice() and bpf_dynptr_slice_rdwr() (added in subsequent commit) should be used. For examples of how skb dynptrs can be used, please see the attached selftests. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://lore.kernel.org/r/20230301154953.641654-8-joannelkoong@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-01bpf: Add __uninit kfunc annotationJoanne Koong1-2/+16
This patch adds __uninit as a kfunc annotation. This will be useful for scenarios such as for example in dynptrs, indicating whether the dynptr should be checked by the verifier as an initialized or an uninitialized dynptr. Without this annotation, the alternative would be needing to hard-code in the verifier the specific kfunc to indicate that arg should be treated as an uninitialized arg. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://lore.kernel.org/r/20230301154953.641654-7-joannelkoong@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-01bpf: Refactor verifier dynptr into get_dynptr_arg_regJoanne Koong1-30/+50
This commit refactors the logic for determining which register in a function is the dynptr into "get_dynptr_arg_reg". This will be used in the future when the dynptr reg for BPF_FUNC_dynptr_write will need to be obtained in order to support writes for skb dynptrs. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://lore.kernel.org/r/20230301154953.641654-6-joannelkoong@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-01bpf: Allow initializing dynptrs in kfuncsJoanne Koong1-45/+22
This change allows kfuncs to take in an uninitialized dynptr as a parameter. Before this change, only helper functions could successfully use uninitialized dynptrs. This change moves the memory access check (including stack state growing and slot marking) into process_dynptr_func(), which both helpers and kfuncs call into. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://lore.kernel.org/r/20230301154953.641654-4-joannelkoong@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-01bpf: Refactor process_dynptr_funcJoanne Koong1-31/+31
This change cleans up process_dynptr_func's flow to be more intuitive and updates some comments with more context. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://lore.kernel.org/r/20230301154953.641654-3-joannelkoong@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-23bpf: Check for helper calls in check_subprogs()Ilya Leoshkevich1-2/+2
The condition src_reg != BPF_PSEUDO_CALL && imm == BPF_FUNC_tail_call may be satisfied by a kfunc call. This would lead to unnecessarily setting has_tail_call. Use src_reg == 0 instead. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20230220163756.753713-1-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-22bpf: Wrap register invalidation with a helperKumar Kartikeya Dwivedi1-14/+14
Typically, verifier should use env->allow_ptr_leaks when invaliding registers for users that don't have CAP_PERFMON or CAP_SYS_ADMIN to avoid leaking the pointer value. This is similar in spirit to c67cae551f0d ("bpf: Tighten ptr_to_btf_id checks."). In a lot of the existing checks, we know the capabilities are present, hence we don't do the check. Instead of being inconsistent in the application of the check, wrap the action of invalidating a register into a helper named 'mark_invalid_reg' and use it in a uniform fashion to replace open coded invalidation operations, so that the check is always made regardless of the call site and we don't have to remember whether it needs to be done or not for each case. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230221200646.2500777-7-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-22bpf: Fix check_reg_type for PTR_TO_BTF_IDKumar Kartikeya Dwivedi1-3/+20
The current code does type matching for the case where reg->type is PTR_TO_BTF_ID or has the PTR_TRUSTED flag. However, this only needs to occur for non-MEM_ALLOC and non-MEM_PERCPU cases, but will include both as per the current code. The MEM_ALLOC case with or without PTR_TRUSTED needs to be handled specially by the code for type_is_alloc case, while MEM_PERCPU case must be ignored. Hence, to restore correct behavior and for clarity, explicitly list out the handled PTR_TO_BTF_ID types which should be handled for each case using a switch statement. Helpers currently only take: PTR_TO_BTF_ID PTR_TO_BTF_ID | PTR_TRUSTED PTR_TO_BTF_ID | MEM_RCU PTR_TO_BTF_ID | MEM_ALLOC PTR_TO_BTF_ID | MEM_PERCPU PTR_TO_BTF_ID | MEM_PERCPU | PTR_TRUSTED This fix was also described (for the MEM_ALLOC case) in [0]. [0]: https://lore.kernel.org/bpf/20221121160657.h6z7xuvedybp5y7s@apollo Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230221200646.2500777-6-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-22bpf: Remove unused MEM_ALLOC | PTR_TRUSTED checksKumar Kartikeya Dwivedi1-2/+0
The plan is to supposedly tag everything with PTR_TRUSTED eventually, however those changes should bring in their respective code, instead of leaving it around right now. It is arguable whether PTR_TRUSTED is required for all types, when it's only use case is making PTR_TO_BTF_ID a bit stronger, while all other types are trusted by default. Hence, just drop the two instances which do not occur in the verifier for now to avoid reader confusion. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230221200646.2500777-5-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-22bpf: Allow reads from uninit stackEduard Zingerman1-1/+10
This commits updates the following functions to allow reads from uninitialized stack locations when env->allow_uninit_stack option is enabled: - check_stack_read_fixed_off() - check_stack_range_initialized(), called from: - check_stack_read_var_off() - check_helper_mem_access() Such change allows to relax logic in stacksafe() to treat STACK_MISC and STACK_INVALID in a same way and make the following stack slot configurations equivalent: | Cached state | Current state | | stack slot | stack slot | |------------------+------------------| | STACK_INVALID or | STACK_INVALID or | | STACK_MISC | STACK_SPILL or | | | STACK_MISC or | | | STACK_ZERO or | | | STACK_DYNPTR | This leads to significant verification speed gains (see below). The idea was suggested by Andrii Nakryiko [1] and initial patch was created by Alexei Starovoitov [2]. Currently the env->allow_uninit_stack is allowed for programs loaded by users with CAP_PERFMON or CAP_SYS_ADMIN capabilities. A number of test cases from verifier/*.c were expecting uninitialized stack access to be an error. These test cases were updated to execute in unprivileged mode (thus preserving the tests). The test progs/test_global_func10.c expected "invalid indirect read from stack" error message because of the access to uninitialized memory region. This error is no longer possible in privileged mode. The test is updated to provoke an error "invalid indirect access to stack" because of access to invalid stack address (such error is not verified by progs/test_global_func*.c series of tests). The following tests had to be removed because these can't be made unprivileged: - verifier/sock.c: - "sk_storage_get(map, skb->sk, &stack_value, 1): partially init stack_value" BPF_PROG_TYPE_SCHED_CLS programs are not executed in unprivileged mode. - verifier/var_off.c: - "indirect variable-offset stack access, max_off+size > max_initialized" - "indirect variable-offset stack access, uninitialized" These tests verify that access to uninitialized stack values is detected when stack offset is not a constant. However, variable stack access is prohibited in unprivileged mode, thus these tests are no longer valid. * * * Here is veristat log comparing this patch with current master on a set of selftest binaries listed in tools/testing/selftests/bpf/veristat.cfg and cilium BPF binaries (see [3]): $ ./veristat -e file,prog,states -C -f 'states_pct<-30' master.log current.log File Program States (A) States (B) States (DIFF) -------------------------- -------------------------- ---------- ---------- ---------------- bpf_host.o tail_handle_ipv6_from_host 349 244 -105 (-30.09%) bpf_host.o tail_handle_nat_fwd_ipv4 1320 895 -425 (-32.20%) bpf_lxc.o tail_handle_nat_fwd_ipv4 1320 895 -425 (-32.20%) bpf_sock.o cil_sock4_connect 70 48 -22 (-31.43%) bpf_sock.o cil_sock4_sendmsg 68 46 -22 (-32.35%) bpf_xdp.o tail_handle_nat_fwd_ipv4 1554 803 -751 (-48.33%) bpf_xdp.o tail_lb_ipv4 6457 2473 -3984 (-61.70%) bpf_xdp.o tail_lb_ipv6 7249 3908 -3341 (-46.09%) pyperf600_bpf_loop.bpf.o on_event 287 145 -142 (-49.48%) strobemeta.bpf.o on_event 15915 4772 -11143 (-70.02%) strobemeta_nounroll2.bpf.o on_event 17087 3820 -13267 (-77.64%) xdp_synproxy_kern.bpf.o syncookie_tc 21271 6635 -14636 (-68.81%) xdp_synproxy_kern.bpf.o syncookie_xdp 23122 6024 -17098 (-73.95%) -------------------------- -------------------------- ---------- ---------- ---------------- Note: I limited selection by states_pct<-30%. Inspection of differences in pyperf600_bpf_loop behavior shows that the following patch for the test removes almost all differences: - a/tools/testing/selftests/bpf/progs/pyperf.h + b/tools/testing/selftests/bpf/progs/pyperf.h @ -266,8 +266,8 @ int __on_event(struct bpf_raw_tracepoint_args *ctx) } if (event->pthread_match || !pidData->use_tls) { - void* frame_ptr; - FrameData frame; + void* frame_ptr = 0; + FrameData frame = {}; Symbol sym = {}; int cur_cpu = bpf_get_smp_processor_id(); W/o this patch the difference comes from the following pattern (for different variables): static bool get_frame_data(... FrameData *frame ...) { ... bpf_probe_read_user(&frame->f_code, ...); if (!frame->f_code) return false; ... bpf_probe_read_user(&frame->co_name, ...); if (frame->co_name) ...; } int __on_event(struct bpf_raw_tracepoint_args *ctx) { FrameData frame; ... get_frame_data(... &frame ...) // indirectly via a bpf_loop & callback ... } SEC("raw_tracepoint/kfree_skb") int on_event(struct bpf_raw_tracepoint_args* ctx) { ... ret |= __on_event(ctx); ret |= __on_event(ctx); ... } With regards to value `frame->co_name` the following is important: - Because of the conditional `if (!frame->f_code)` each call to __on_event() produces two states, one with `frame->co_name` marked as STACK_MISC, another with it as is (and marked STACK_INVALID on a first call). - The call to bpf_probe_read_user() does not mark stack slots corresponding to `&frame->co_name` as REG_LIVE_WRITTEN but it marks these slots as BPF_MISC, this happens because of the following loop in the check_helper_call(): for (i = 0; i < meta.access_size; i++) { err = check_mem_access(env, insn_idx, meta.regno, i, BPF_B, BPF_WRITE, -1, false); if (err) return err; } Note the size of the write, it is a one byte write for each byte touched by a helper. The BPF_B write does not lead to write marks for the target stack slot. - Which means that w/o this patch when second __on_event() call is verified `if (frame->co_name)` will propagate read marks first to a stack slot with STACK_MISC marks and second to a stack slot with STACK_INVALID marks and these states would be considered different. [1] https://lore.kernel.org/bpf/CAEf4BzY3e+ZuC6HUa8dCiUovQRg2SzEk7M-dSkqNZyn=xEmnPA@mail.gmail.com/ [2] https://lore.kernel.org/bpf/CAADnVQKs2i1iuZ5SUGuJtxWVfGYR9kDgYKhq3rNV+kBLQCu7rA@mail.gmail.com/ [3] git@github.com:anakryiko/cilium.git Suggested-by: Andrii Nakryiko <andrii@kernel.org> Co-developed-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230219200427.606541-2-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-15bpf: BPF_ST with variable offset should preserve STACK_ZERO marksEduard Zingerman1-1/+3
BPF_STX instruction preserves STACK_ZERO marks for variable offset writes in situations like below: *(u64*)(r10 - 8) = 0 ; STACK_ZERO marks for fp[-8] r0 = random(-7, -1) ; some random number in range of [-7, -1] r0 += r10 ; r0 is now a variable offset pointer to stack r1 = 0 *(u8*)(r0) = r1 ; BPF_STX writing zero, STACK_ZERO mark for ; fp[-8] is preserved This commit updates verifier.c:check_stack_write_var_off() to process BPF_ST in a similar manner, e.g. the following example: *(u64*)(r10 - 8) = 0 ; STACK_ZERO marks for fp[-8] r0 = random(-7, -1) ; some random number in range of [-7, -1] r0 += r10 ; r0 is now variable offset pointer to stack *(u8*)(r0) = 0 ; BPF_ST writing zero, STACK_ZERO mark for ; fp[-8] is preserved Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230214232030.1502829-4-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-15bpf: track immediate values written to stack by BPF_ST instructionEduard Zingerman1-2/+16
For aligned stack writes using BPF_ST instruction track stored values in a same way BPF_STX is handled, e.g. make sure that the following commands produce similar verifier knowledge: fp[-8] = 42; r1 = 42; fp[-8] = r1; This covers two cases: - non-null values written to stack are stored as spill of fake registers; - null values written to stack are stored as STACK_ZERO marks. Previously both cases above used STACK_MISC marks instead. Some verifier test cases relied on the old logic to obtain STACK_MISC marks for some stack values. These test cases are updated in the same commit to avoid failures during bisect. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230214232030.1502829-2-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-14bpf: Special verifier handling for bpf_rbtree_{remove, first}Dave Marchevsky1-12/+31
Newly-added bpf_rbtree_{remove,first} kfuncs have some special properties that require handling in the verifier: * both bpf_rbtree_remove and bpf_rbtree_first return the type containing the bpf_rb_node field, with the offset set to that field's offset, instead of a struct bpf_rb_node * * mark_reg_graph_node helper added in previous patch generalizes this logic, use it * bpf_rbtree_remove's node input is a node that's been inserted in the tree - a non-owning reference. * bpf_rbtree_remove must invalidate non-owning references in order to avoid aliasing issue. Use previously-added invalidate_non_owning_refs helper to mark this function as a non-owning ref invalidation point. * Unlike other functions, which convert one of their input arg regs to non-owning reference, bpf_rbtree_first takes no arguments and just returns a non-owning reference (possibly null) * For now verifier logic for this is special-cased instead of adding new kfunc flag. This patch, along with the previous one, complete special verifier handling for all rbtree API functions added in this series. With functional verifier handling of rbtree_remove, under current non-owning reference scheme, a node type with both bpf_{list,rb}_node fields could cause the verifier to accept programs which remove such nodes from collections they haven't been added to. In order to prevent this, this patch adds a check to btf_parse_fields which rejects structs with both bpf_{list,rb}_node fields. This is a temporary measure that can be removed after "collection identity" followup. See comment added in btf_parse_fields. A linked_list BTF test exercising the new check is added in this patch as well. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20230214004017.2534011-6-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-14bpf: Add callback validation to kfunc verifier logicDave Marchevsky1-5/+129
Some BPF helpers take a callback function which the helper calls. For each helper that takes such a callback, there's a special call to __check_func_call with a callback-state-setting callback that sets up verifier bpf_func_state for the callback's frame. kfuncs don't have any of this infrastructure yet, so let's add it in this patch, following existing helper pattern as much as possible. To validate functionality of this added plumbing, this patch adds callback handling for the bpf_rbtree_add kfunc and hopes to lay groundwork for future graph datastructure callbacks. In the "general plumbing" category we have: * check_kfunc_call doing callback verification right before clearing CALLER_SAVED_REGS, exactly like check_helper_call * recognition of func_ptr BTF types in kfunc args as KF_ARG_PTR_TO_CALLBACK + propagation of subprogno for this arg type In the "rbtree_add / graph datastructure-specific plumbing" category: * Since bpf_rbtree_add must be called while the spin_lock associated with the tree is held, don't complain when callback's func_state doesn't unlock it by frame exit * Mark rbtree_add callback's args with ref_set_non_owning to prevent rbtree api functions from being called in the callback. Semantically this makes sense, as less() takes no ownership of its args when determining which comes first. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20230214004017.2534011-5-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-14bpf: Add support for bpf_rb_root and bpf_rb_node in kfunc argsDave Marchevsky1-35/+203
Now that we find bpf_rb_root and bpf_rb_node in structs, let's give args that contain those types special classification and properly handle these types when checking kfunc args. "Properly handling" these types largely requires generalizing similar handling for bpf_list_{head,node}, with little new logic added in this patch. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20230214004017.2534011-4-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-14bpf: Add bpf_rbtree_{add,remove,first} kfuncsDave Marchevsky1-1/+13
This patch adds implementations of bpf_rbtree_{add,remove,first} and teaches verifier about their BTF_IDs as well as those of bpf_rb_{root,node}. All three kfuncs have some nonstandard component to their verification that needs to be addressed in future patches before programs can properly use them: * bpf_rbtree_add: Takes 'less' callback, need to verify it * bpf_rbtree_first: Returns ptr_to_node_type(off=rb_node_off) instead of ptr_to_rb_node(off=0). Return value ref is non-owning. * bpf_rbtree_remove: Returns ptr_to_node_type(off=rb_node_off) instead of ptr_to_rb_node(off=0). 2nd arg (node) is a non-owning reference. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20230214004017.2534011-3-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-14bpf: Add basic bpf_rb_{root,node} supportDave Marchevsky1-2/+3
This patch adds special BPF_RB_{ROOT,NODE} btf_field_types similar to BPF_LIST_{HEAD,NODE}, adds the necessary plumbing to detect the new types, and adds bpf_rb_root_free function for freeing bpf_rb_root in map_values. structs bpf_rb_root and bpf_rb_node are opaque types meant to obscure structs rb_root_cached rb_node, respectively. btf_struct_access will prevent BPF programs from touching these special fields automatically now that they're recognized. btf_check_and_fixup_fields now groups list_head and rb_root together as "graph root" fields and {list,rb}_node as "graph node", and does same ownership cycle checking as before. Note that this function does _not_ prevent ownership type mixups (e.g. rb_root owning list_node) - that's handled by btf_parse_graph_root. After this patch, a bpf program can have a struct bpf_rb_root in a map_value, but not add anything to nor do anything useful with it. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20230214004017.2534011-2-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-14bpf: Migrate release_on_unlock logic to non-owning ref semanticsDave Marchevsky1-49/+119
This patch introduces non-owning reference semantics to the verifier, specifically linked_list API kfunc handling. release_on_unlock logic for refs is refactored - with small functional changes - to implement these semantics, and bpf_list_push_{front,back} are migrated to use them. When a list node is pushed to a list, the program still has a pointer to the node: n = bpf_obj_new(typeof(*n)); bpf_spin_lock(&l); bpf_list_push_back(&l, n); /* n still points to the just-added node */ bpf_spin_unlock(&l); What the verifier considers n to be after the push, and thus what can be done with n, are changed by this patch. Common properties both before/after this patch: * After push, n is only a valid reference to the node until end of critical section * After push, n cannot be pushed to any list * After push, the program can read the node's fields using n Before: * After push, n retains the ref_obj_id which it received on bpf_obj_new, but the associated bpf_reference_state's release_on_unlock field is set to true * release_on_unlock field and associated logic is used to implement "n is only a valid ref until end of critical section" * After push, n cannot be written to, the node must be removed from the list before writing to its fields * After push, n is marked PTR_UNTRUSTED After: * After push, n's ref is released and ref_obj_id set to 0. NON_OWN_REF type flag is added to reg's type, indicating that it's a non-owning reference. * NON_OWN_REF flag and logic is used to implement "n is only a valid ref until end of critical section" * n can be written to (except for special fields e.g. bpf_list_node, timer, ...) Summary of specific implementation changes to achieve the above: * release_on_unlock field, ref_set_release_on_unlock helper, and logic to "release on unlock" based on that field are removed * The anonymous active_lock struct used by bpf_verifier_state is pulled out into a named struct bpf_active_lock. * NON_OWN_REF type flag is introduced along with verifier logic changes to handle non-owning refs * Helpers are added to use NON_OWN_REF flag to implement non-owning ref semantics as described above * invalidate_non_owning_refs - helper to clobber all non-owning refs matching a particular bpf_active_lock identity. Replaces release_on_unlock logic in process_spin_lock. * ref_set_non_owning - set NON_OWN_REF type flag after doing some sanity checking * ref_convert_owning_non_owning - convert owning reference w/ specified ref_obj_id to non-owning references. Set NON_OWN_REF flag for each reg with that ref_obj_id and 0-out its ref_obj_id * Update linked_list selftests to account for minor semantic differences introduced by this patch * Writes to a release_on_unlock node ref are not allowed, while writes to non-owning reference pointees are. As a result the linked_list "write after push" failure tests are no longer scenarios that should fail. * The test##missing_lock##op and test##incorrect_lock##op macro-generated failure tests need to have a valid node argument in order to have the same error output as before. Otherwise verification will fail early and the expected error output won't be seen. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20230212092715.1422619-2-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-7/+18
net/core/gro.c 7d2c89b32587 ("skb: Do mix page pool and page referenced frags in GRO") b1a78b9b9886 ("net: add support for ipv4 big tcp") https://lore.kernel.org/all/20230203094454.5766f160@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-28Merge tag 'for-netdev' of ↵Jakub Kicinski1-85/+459
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== bpf-next 2023-01-28 We've added 124 non-merge commits during the last 22 day(s) which contain a total of 124 files changed, 6386 insertions(+), 1827 deletions(-). The main changes are: 1) Implement XDP hints via kfuncs with initial support for RX hash and timestamp metadata kfuncs, from Stanislav Fomichev and Toke Høiland-Jørgensen. Measurements on overhead: https://lore.kernel.org/bpf/875yellcx6.fsf@toke.dk 2) Extend libbpf's bpf_tracing.h support for tracing arguments of kprobes/uprobes and syscall as a special case, from Andrii Nakryiko. 3) Significantly reduce the search time for module symbols by livepatch and BPF, from Jiri Olsa and Zhen Lei. 4) Enable cpumasks to be used as kptrs, which is useful for tracing programs tracking which tasks end up running on which CPUs in different time intervals, from David Vernet. 5) Fix several issues in the dynptr processing such as stack slot liveness propagation, missing checks for PTR_TO_STACK variable offset, etc, from Kumar Kartikeya Dwivedi. 6) Various performance improvements, fixes, and introduction of more than just one XDP program to XSK selftests, from Magnus Karlsson. 7) Big batch to BPF samples to reduce deprecated functionality, from Daniel T. Lee. 8) Enable struct_ops programs to be sleepable in verifier, from David Vernet. 9) Reduce pr_warn() noise on BTF mismatches when they are expected under the CONFIG_MODULE_ALLOW_BTF_MISMATCH config anyway, from Connor O'Brien. 10) Describe modulo and division by zero behavior of the BPF runtime in BPF's instruction specification document, from Dave Thaler. 11) Several improvements to libbpf API documentation in libbpf.h, from Grant Seltzer. 12) Improve resolve_btfids header dependencies related to subcmd and add proper support for HOSTCC, from Ian Rogers. 13) Add ipip6 and ip6ip decapsulation support for bpf_skb_adjust_room() helper along with BPF selftests, from Ziyang Xuan. 14) Simplify the parsing logic of structure parameters for BPF trampoline in the x86-64 JIT compiler, from Pu Lehui. 15) Get BTF working for kernels with CONFIG_RUST enabled by excluding Rust compilation units with pahole, from Martin Rodriguez Reboredo. 16) Get bpf_setsockopt() working for kTLS on top of TCP sockets, from Kui-Feng Lee. 17) Disable stack protection for BPF objects in bpftool given BPF backends don't support it, from Holger Hoffstätte. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (124 commits) selftest/bpf: Make crashes more debuggable in test_progs libbpf: Add documentation to map pinning API functions libbpf: Fix malformed documentation formatting selftests/bpf: Properly enable hwtstamp in xdp_hw_metadata selftests/bpf: Calls bpf_setsockopt() on a ktls enabled socket. bpf: Check the protocol of a sock to agree the calls to bpf_setsockopt(). bpf/selftests: Verify struct_ops prog sleepable behavior bpf: Pass const struct bpf_prog * to .check_member libbpf: Support sleepable struct_ops.s section bpf: Allow BPF_PROG_TYPE_STRUCT_OPS programs to be sleepable selftests/bpf: Fix vmtest static compilation error tools/resolve_btfids: Alter how HOSTCC is forced tools/resolve_btfids: Install subcmd headers bpf/docs: Document the nocast aliasing behavior of ___init bpf/docs: Document how nested trusted fields may be defined bpf/docs: Document cpumask kfuncs in a new file selftests/bpf: Add selftest suite for cpumask kfuncs selftests/bpf: Add nested trust selftests suite bpf: Enable cpumasks to be queried and used as kptrs bpf: Disallow NULLable pointers for trusted kfuncs ... ==================== Link: https://lore.kernel.org/r/20230128004827.21371-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-25bpf: Pass const struct bpf_prog * to .check_memberDavid Vernet1-1/+1
The .check_member field of struct bpf_struct_ops is currently passed the member's btf_type via const struct btf_type *t, and a const struct btf_member *member. This allows the struct_ops implementation to check whether e.g. an ops is supported, but it would be useful to also enforce that the struct_ops prog being loaded for that member has other qualities, like being sleepable (or not). This patch therefore updates the .check_member() callback to also take a const struct bpf_prog *prog argument. Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20230125164735.785732-4-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>