kernel/linux.git - Linux kernel stable tree (mirror)

Age	Commit message (Collapse)	Author	Files	Lines
2021-07-31	tools: bpftool: Complete and synchronise attach or map types	Quentin Monnet	4	-5/+12
	Update bpftool's list of attach type names to tell it about the latest attach types, or the "ringbuf" map. Also update the documentation, help messages, and bash completion when relevant. These missing items were reported by the newly added Python script used to help maintain consistency in bpftool. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210730215435.7095-4-quentin@isovalent.com
2021-07-31	selftests/bpf: Check consistency between bpftool source, doc, completion	Quentin Monnet	1	-0/+486
	Whenever the eBPF subsystem gains new elements, such as new program or map types, it is necessary to update bpftool if we want it able to handle the new items. In addition to the main arrays containing the names of these elements in the source code, there are also multiple locations to update: - The help message in the do_help() functions in bpftool's source code. - The RST documentation files. - The bash completion file. This has led to omissions multiple times in the past. This patch attempts to address this issue by adding consistency checks for all these different locations. It also verifies that the bpf_prog_type, bpf_map_type and bpf_attach_type enums from the UAPI BPF header have all their members present in bpftool. The script requires no argument to run, it reads and parses the different files to check, and prints the mismatches, if any. It currently reports a number of missing elements, which will be fixed in a later patch: $ ./test_bpftool_synctypes.py Comparing [...]/linux/tools/bpf/bpftool/map.c (map_type_name) and [...]/linux/tools/bpf/bpftool/bash-completion/bpftool (BPFTOOL_MAP_CREATE_TYPES): {'ringbuf'} Comparing BPF header (enum bpf_attach_type) and [...]/linux/tools/bpf/bpftool/common.c (attach_type_name): {'BPF_TRACE_ITER', 'BPF_XDP_DEVMAP', 'BPF_XDP', 'BPF_SK_REUSEPORT_SELECT', 'BPF_XDP_CPUMAP', 'BPF_SK_REUSEPORT_SELECT_OR_MIGRATE'} Comparing [...]/linux/tools/bpf/bpftool/prog.c (attach_type_strings) and [...]/linux/tools/bpf/bpftool/prog.c (do_help() ATTACH_TYPE): {'skb_verdict'} Comparing [...]/linux/tools/bpf/bpftool/prog.c (attach_type_strings) and [...]/linux/tools/bpf/bpftool/Documentation/bpftool-prog.rst (ATTACH_TYPE): {'skb_verdict'} Comparing [...]/linux/tools/bpf/bpftool/prog.c (attach_type_strings) and [...]/linux/tools/bpf/bpftool/bash-completion/bpftool (BPFTOOL_PROG_ATTACH_TYPES): {'skb_verdict'} Note that the script does NOT check for consistency between the list of program types that bpftool claims it accepts and the actual list of keywords that can be used. This is because bpftool does not "see" them, they are ELF section names parsed by libbpf. It is not hard to parse the section_defs[] array in libbpf, but some section names are associated with program types that bpftool cannot load at the moment. For example, some programs require a BTF target and an attach target that bpftool cannot handle. The script may be extended to parse the array and check only relevant values in the future. The script is not added to the selftests' Makefile, because doing so would require all patches with BPF UAPI change to also update bpftool. Instead it is to be added to the CI. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210730215435.7095-3-quentin@isovalent.com
2021-07-31	tools: bpftool: Slightly ease bash completion updates	Quentin Monnet	1	-25/+29
	Bash completion for bpftool gets two minor improvements in this patch. Move the detection of attach types for "bpftool cgroup attach" outside of the "case/esac" bloc, where we cannot reuse our variable holding the list of supported attach types as a pattern list. After the change, we have only one list of cgroup attach types to update when new types are added, instead of the former two lists. Also rename the variables holding lists of names for program types, map types, and attach types, to make them more unique. This can make it slightly easier to point people to the relevant variables to update, but the main objective here is to help run a script to check that bash completion is up-to-date with bpftool's source code. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210730215435.7095-2-quentin@isovalent.com
2021-07-30	unix_bpf: Fix a potential deadlock in unix_dgram_bpf_recvmsg()	Cong Wang	1	-8/+8
	As Eric noticed, __unix_dgram_recvmsg() may acquire u->iolock too, so we have to release it before calling this function. Fixes: 9825d866ce0d ("af_unix: Implement unix_dgram_bpf_recvmsg()") Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com>
2021-07-30	libbpf: Add btf__load_vmlinux_btf/btf__load_module_btf	Hengqi Chen	4	-6/+20
	Add two new APIs: btf__load_vmlinux_btf and btf__load_module_btf. btf__load_vmlinux_btf is just an alias to the existing API named libbpf_find_kernel_btf, rename to be more precisely and consistent with existing BTF APIs. btf__load_module_btf can be used to load module BTF, add it for completeness. These two APIs are useful for implementing tracing tools and introspection tools. This is part of the effort towards libbpf 1.0 ([0]). [0] Closes: https://github.com/libbpf/libbpf/issues/280 Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210730114012.494408-1-hengqi.chen@gmail.com
2021-07-30	Merge branch 'libbpf: rename btf__get_from_id() and btf__load() APIs, ↵	Andrii Nakryiko	11	-47/+90
	support split BTF' Quentin Monnet says: ==================== As part of the effort to move towards a v1.0 for libbpf [0], this set improves some confusing function names related to BTF loading from and to the kernel: - btf__load() becomes btf__load_into_kernel(). - btf__get_from_id becomes btf__load_from_kernel_by_id(). - A new version btf__load_from_kernel_by_id_split() extends the former to add support for split BTF. The last patch is a trivial change to bpftool to add support for dumping split BTF objects by referencing them by their id (and not only by their BTF path). [0] https://github.com/libbpf/libbpf/wiki/Libbpf:-the-road-to-v1.0#btfh-apis v3: - Use libbpf_err_ptr() in btf__load_from_kernel_by_id(), ERR_PTR() in bpftool's get_map_kv_btf(). - Move the definition of btf__load_from_kernel_by_id() closer to the btf__parse() group in btf.h (move the legacy function with it). - Fix a bug on the return value in libbpf_find_prog_btf_id(), as a new patch. - Move the btf__free() fixes to their own patch. - Add "Fixes:" tags to relevant patches. v2: - Remove deprecation marking of legacy functions (patch 4/6 from v1). - Make btf__load_from_kernel_by_id{,_split}() return the btf struct, adjust surrounding code and call btf__free() when missing. - Add new functions to v0.5.0 API (and not v0.6.0). ==================== Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2021-07-30	tools: bpftool: Support dumping split BTF by id	Quentin Monnet	1	-1/+1
	Split BTF objects are typically BTF objects for kernel modules, which are incrementally built on top of kernel BTF instead of redefining all kernel symbols they need. We can use bpftool with its -B command-line option to dump split BTF objects. It works well when the handle provided for the BTF object to dump is a "path" to the BTF object, typically under /sys/kernel/btf, because bpftool internally calls btf__parse_split() which can take a "base_btf" pointer and resolve the BTF reconstruction (although in that case, the "-B" option is unnecessary because bpftool performs autodetection). However, it did not work so far when passing the BTF object through its id, because bpftool would call btf__get_from_id() which did not provide a way to pass a "base_btf" pointer. In other words, the following works: # bpftool btf dump file /sys/kernel/btf/i2c_smbus -B /sys/kernel/btf/vmlinux But this was not possible: # bpftool btf dump id 6 -B /sys/kernel/btf/vmlinux The libbpf API has recently changed, and btf__get_from_id() has been deprecated in favour of btf__load_from_kernel_by_id() and its version with support for split BTF, btf__load_from_kernel_by_id_split(). Let's update bpftool to make it able to dump the BTF object in the second case as well. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210729162028.29512-9-quentin@isovalent.com
2021-07-30	libbpf: Add split BTF support for btf__load_from_kernel_by_id()	Quentin Monnet	3	-2/+9
	Add a new API function btf__load_from_kernel_by_id_split(), which takes a pointer to a base BTF object in order to support split BTF objects when retrieving BTF information from the kernel. Reference: https://github.com/libbpf/libbpf/issues/314 Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210729162028.29512-8-quentin@isovalent.com
2021-07-30	tools: Replace btf__get_from_id() with btf__load_from_kernel_by_id()	Quentin Monnet	7	-29/+42
	Replace the calls to function btf__get_from_id(), which we plan to deprecate before the library reaches v1.0, with calls to btf__load_from_kernel_by_id() in tools/ (bpftool, perf, selftests). Update the surrounding code accordingly (instead of passing a pointer to the btf struct, get it as a return value from the function). Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210729162028.29512-6-quentin@isovalent.com
2021-07-30	tools: Free BTF objects at various locations	Quentin Monnet	4	-4/+9
	Make sure to call btf__free() (and not simply free(), which does not free all pointers stored in the struct) on pointers to struct btf objects retrieved at various locations. These were found while updating the calls to btf__get_from_id(). Fixes: 999d82cbc044 ("tools/bpf: enhance test_btf file testing to test func info") Fixes: 254471e57a86 ("tools/bpf: bpftool: add support for func types") Fixes: 7b612e291a5a ("perf tools: Synthesize PERF_RECORD_* for loaded BPF programs") Fixes: d56354dc4909 ("perf tools: Save bpf_prog_info and BTF of new BPF programs") Fixes: 47c09d6a9f67 ("bpftool: Introduce "prog profile" command") Fixes: fa853c4b839e ("perf stat: Enable counting events for BPF programs") Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210729162028.29512-5-quentin@isovalent.com
2021-07-30	libbpf: Rename btf__get_from_id() as btf__load_from_kernel_by_id()	Quentin Monnet	4	-11/+24
	Rename function btf__get_from_id() as btf__load_from_kernel_by_id() to better indicate what the function does. Change the new function so that, instead of requiring a pointer to the pointer to update and returning with an error code, it takes a single argument (the id of the BTF object) and returns the corresponding pointer. This is more in line with the existing constructors. The other tools calling the (soon-to-be) deprecated btf__get_from_id() function will be updated in a future commit. References: - https://github.com/libbpf/libbpf/issues/278 - https://github.com/libbpf/libbpf/wiki/Libbpf:-the-road-to-v1.0#btfh-apis Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210729162028.29512-4-quentin@isovalent.com
2021-07-30	libbpf: Rename btf__load() as btf__load_into_kernel()	Quentin Monnet	4	-2/+5
	As part of the effort to move towards a v1.0 for libbpf, rename btf__load() function, used to "upload" BTF information into the kernel, as btf__load_into_kernel(). This new name better reflects what the function does. References: - https://github.com/libbpf/libbpf/issues/278 - https://github.com/libbpf/libbpf/wiki/Libbpf:-the-road-to-v1.0#btfh-apis Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210729162028.29512-3-quentin@isovalent.com
2021-07-30	libbpf: Return non-null error on failures in libbpf_find_prog_btf_id()	Quentin Monnet	1	-1/+3
	Variable "err" is initialised to -EINVAL so that this error code is returned when something goes wrong in libbpf_find_prog_btf_id(). However, a recent change in the function made use of the variable in such a way that it is set to 0 if retrieving linear information on the program is successful, and this 0 value remains if we error out on failures at later stages. Let's fix this by setting err to -EINVAL later in the function. Fixes: e9fc3ce99b34 ("libbpf: Streamline error reporting for high-level APIs") Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210729162028.29512-2-quentin@isovalent.com
2021-07-30	bpf: Emit better log message if bpf_iter ctx arg btf_id == 0	Yonghong Song	1	-0/+5
	To avoid kernel build failure due to some missing .BTF-ids referenced functions/types, the patch ([1]) tries to fill btf_id 0 for these types. In bpf verifier, for percpu variable and helper returning btf_id cases, verifier already emitted proper warning with something like verbose(env, "Helper has invalid btf_id in R%d\n", regno); verbose(env, "invalid return type %d of func %s#%d\n", fn->ret_type, func_id_name(func_id), func_id); But this is not the case for bpf_iter context arguments. I hacked resolve_btfids to encode btf_id 0 for struct task_struct. With `./test_progs -n 7/5`, I got, 0: (79) r2 = (u64 )(r1 +0) func 'bpf_iter_task' arg0 has btf_id 29739 type STRUCT 'bpf_iter_meta' ; struct seq_file seq = ctx->meta->seq; 1: (79) r6 = (u64 )(r2 +0) ; struct task_struct task = ctx->task; 2: (79) r7 = (u64 )(r1 +8) ; if (task == (void )0) { 3: (55) if r7 != 0x0 goto pc+11 ... ; BPF_SEQ_PRINTF(seq, "%8d %8d\n", task->tgid, task->pid); 26: (61) r1 = (u32 )(r7 +1372) Type '(anon)' is not a struct Basically, verifier will return btf_id 0 for task_struct. Later on, when the code tries to access task->tgid, the verifier correctly complains the type is '(anon)' and it is not a struct. Users still need to backtrace to find out what is going on. Let us catch the invalid btf_id 0 earlier and provide better message indicating btf_id is wrong. The new error message looks like below: R1 type=ctx expected=fp ; struct seq_file seq = ctx->meta->seq; 0: (79) r2 = (u64 )(r1 +0) func 'bpf_iter_task' arg0 has btf_id 29739 type STRUCT 'bpf_iter_meta' ; struct seq_file seq = ctx->meta->seq; 1: (79) r6 = (u64 )(r2 +0) ; struct task_struct task = ctx->task; 2: (79) r7 = (u64 )(r1 +8) invalid btf_id for context argument offset 8 invalid bpf_context access off=8 size=8 [1] https://lore.kernel.org/bpf/20210727132532.2473636-1-hengqi.chen@gmail.com/ Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210728183025.1461750-1-yhs@fb.com
2021-07-30	tools/resolve_btfids: Emit warnings and patch zero id for missing symbols	Hengqi Chen	1	-6/+7
	Kernel functions referenced by .BTF_ids may be changed from global to static and get inlined or get renamed/removed, and thus disappears from BTF. This causes kernel build failure when resolve_btfids do id patch for symbols in .BTF_ids in vmlinux. Update resolve_btfids to emit warning messages and patch zero id for missing symbols instead of aborting kernel build process. Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210727132532.2473636-2-hengqi.chen@gmail.com
2021-07-28	bpf: Increase supported cgroup storage value size	Stanislav Fomichev	4	-16/+45
	Current max cgroup storage value size is 4k (PAGE_SIZE). The other local storages accept up to 64k (BPF_LOCAL_STORAGE_MAX_VALUE_SIZE). Let's align max cgroup value size with the other storages. For percpu, the max is 32k (PCPU_MIN_UNIT_SIZE) because percpu allocator is not happy about larger values. netcnt test is extended to exercise those maximum values (non-percpu max size is close to, but not real max). v4: * remove inner union (Andrii Nakryiko) * keep net_cnt on the stack (Andrii Nakryiko) v3: * refine SIZEOF_BPF_LOCAL_STORAGE_ELEM comment (Yonghong Song) * anonymous struct in percpu_net_cnt & net_cnt (Yonghong Song) * reorder free (Yonghong Song) v2: * cap max_value_size instead of BUILD_BUG_ON (Martin KaFai Lau) Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20210727222335.4029096-1-sdf@google.com
2021-07-28	libbpf: Fix race when pinning maps in parallel	Martynas Pumputis	1	-1/+14
	When loading in parallel multiple programs which use the same to-be pinned map, it is possible that two instances of the loader will call bpf_object__create_maps() at the same time. If the map doesn't exist when both instances call bpf_object__reuse_map(), then one of the instances will fail with EEXIST when calling bpf_map__pin(). Fix the race by retrying reusing a map if bpf_map__pin() returns EEXIST. The fix is similar to the one in iproute2: e4c4685fd6e4 ("bpf: Fix race condition with map pinning"). Before retrying the pinning, we don't do any special cleaning of an internal map state. The closer code inspection revealed that it's not required: - bpf_object__create_map(): map->inner_map is destroyed after a successful call, map->fd is closed if pinning fails. - bpf_object__populate_internal_map(): created map elements is destroyed upon close(map->fd). - init_map_slots(): slots are freed after their initialization. Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210726152001.34845-1-m@lambda.lt
2021-07-28	libbpf: Fix comment typo	Jason Wang	1	-3/+3
	Remove the repeated word 'the' in line 48. Signed-off-by: Jason Wang <wangborong@cdjrlc.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210727115928.74600-1-wangborong@cdjrlc.com
2021-07-27	samples: bpf: Add the omitted xdp samples to .gitignore	Juhee Kang	1	-0/+2
	There are recently added xdp samples (xdp_redirect_map_multi and xdpsock_ctrl_proc) which are not managed by .gitignore. This commit adds these files to .gitignore. Signed-off-by: Juhee Kang <claudiajkang@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210727041056.23455-2-claudiajkang@gmail.com
2021-07-27	samples: bpf: Fix tracex7 error raised on the missing argument	Juhee Kang	2	-0/+6
	The current behavior of 'tracex7' doesn't consist with other bpf samples tracex{1..6}. Other samples do not require any argument to run with, but tracex7 should be run with btrfs device argument. (it should be executed with test_override_return.sh) Currently, tracex7 doesn't have any description about how to run this program and raises an unexpected error. And this result might be confusing since users might not have a hunch about how to run this program. // Current behavior # ./tracex7 sh: 1: Syntax error: word unexpected (expecting ")") // Fixed behavior # ./tracex7 ERROR: Run with the btrfs device argument! In order to fix this error, this commit adds logic to report a message and exit when running this program with a missing argument. Additionally in test_override_return.sh, there is a problem with multiple directory(tmpmnt) creation. So in this commit adds a line with removing the directory with every execution. Signed-off-by: Juhee Kang <claudiajkang@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210727041056.23455-1-claudiajkang@gmail.com
2021-07-27	selftests/bpf: Use ping6 only if available in tc_redirect	Jussi Maki	1	-6/+17
	In the tc_redirect test only use ping6 if it's available and otherwise fall back to using "ping -6". Signed-off-by: Jussi Maki <joamaki@gmail.com>
2021-07-26	Merge branch 'libbpf: Move CO-RE logic into separate file.'	Andrii Nakryiko	5	-1382/+1440
	Alexei Starovoitov says: ==================== From: Alexei Starovoitov <ast@kernel.org> Split CO-RE processing logic from libbpf into separate file with an interface that doesn't dependend on libbpf internal details. As the next step relo_core.c will be compiled with libbpf and with the kernel. The _internal_ interface between libbpf/CO-RE and kernel/CO-RE will be: int bpf_core_apply_relo_insn(const char prog_name, struct bpf_insn insn, int insn_idx, const struct bpf_core_relo relo, int relo_idx, const struct btf local_btf, struct bpf_core_cand_list *cands); where bpf_core_relo and bpf_core_cand_list are simple types prepared by kernel and libbpf. Though diff stat shows a lot of lines inserted/deleted they are moved lines. Pls review with diff.colorMoved. ==================== Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2021-07-26	libbpf: Split CO-RE logic into relo_core.c.	Alexei Starovoitov	5	-1297/+1319
	Move CO-RE logic into separate file. The internal interface between libbpf and CO-RE is through bpf_core_apply_relo_insn() function and few structs defined in relo_core.h. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210721000822.40958-5-alexei.starovoitov@gmail.com
2021-07-26	libbpf: Move CO-RE types into relo_core.h.	Alexei Starovoitov	3	-93/+102
	In order to make a clean split of CO-RE logic move its types into independent header file. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210721000822.40958-4-alexei.starovoitov@gmail.com
2021-07-26	libbpf: Split bpf_core_apply_relo() into bpf_program independent helper.	Alexei Starovoitov	1	-46/+71
	bpf_core_apply_relo() doesn't need to know bpf_program internals and hashmap details. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210721000822.40958-3-alexei.starovoitov@gmail.com
2021-07-26	libbpf: Cleanup the layering between CORE and bpf_program.	Alexei Starovoitov	1	-36/+38
	CO-RE processing functions don't need to know 'struct bpf_program' details. Cleanup the layering to eventually be able to move CO-RE logic into a separate file. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210721000822.40958-2-alexei.starovoitov@gmail.com
2021-07-24	bpf/tests: Do not PASS tests without actually testing the result	Johan Almbladh	1	-1/+8
	Each test case can have a set of sub-tests, where each sub-test can run the cBPF/eBPF test snippet with its own data_size and expected result. Before, the end of the sub-test array was indicated by both data_size and result being zero. However, most or all of the internal eBPF tests has a data_size of zero already. When such a test also had an expected value of zero, the test was never run but reported as PASS anyway. Now the test runner always runs the first sub-test, regardless of the data_size and result values. The sub-test array zero-termination only applies for any additional sub-tests. There are other ways fix it of course, but this solution at least removes the surprise of eBPF tests with a zero result always succeeding. Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210721103822.3755111-1-johan.almbladh@anyfinetworks.com
2021-07-24	bpf/tests: Fix copy-and-paste error in double word test	Johan Almbladh	1	-2/+2
	This test now operates on DW as stated instead of W, which was already covered by another test. Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210721104058.3755254-1-johan.almbladh@anyfinetworks.com
2021-07-24	selftests/bpf: Document vmtest.sh dependencies	Evgeniy Litvinenko	1	-0/+7
	Add a list of vmtest script dependencies to make it easier for new contributors to get going. Signed-off-by: Evgeniy Litvinenko <evgeniyl@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210723223645.907802-1-evgeniyl@fb.com
2021-07-24	libbpf: Add bpf_map__pin_path function	Evgeniy Litvinenko	4	-0/+16
	Add bpf_map__pin_path, so that the inconsistently named bpf_map__get_pin_path can be deprecated later. This is part of the effort towards libbpf v1.0: https://github.com/libbpf/libbpf/issues/307 Also, add a selftest for the new function. Signed-off-by: Evgeniy Litvinenko <evgeniyl@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210723221511.803683-1-evgeniyl@fb.com
2021-07-24	Merge branch 'bpf: Allow bpf tcp iter to do bpf_(get\|set)sockopt'	Andrii Nakryiko	12	-97/+784
	Martin KaFai says: ==================== This set is to allow bpf tcp iter to call bpf_(get\|set)sockopt. With bpf-tcp-cc, new algo rollout happens more often. Instead of restarting the applications to pick up the new tcp-cc, this set allows the bpf tcp iter to call bpf_(get\|set)sockopt(TCP_CONGESTION). It is not limited to TCP_CONGESTION, the bpf tcp iter can call bpf_(get\|set)sockopt() with other options. The bpf tcp iter can read into all the fields of a tcp_sock, so there is a lot of flexibility to select the desired sk to do setsockopt(), e.g. it can test for TCP_LISTEN only and leave the established connections untouched, or check the addr/port, or check the current tcp-cc name, ...etc. Patch 1-4 are some cleanup and prep work in the tcp and bpf seq_file. Patch 5 is to have the tcp seq_file iterate on the port+addr lhash2 instead of the port only listening_hash. Patch 6 is to have the bpf tcp iter doing batching which then allows lock_sock. lock_sock is needed for setsockopt. Patch 7 allows the bpf tcp iter to call bpf_(get\|set)sockopt. v2: - Use __GFP_NOWARN in patch 6 - Add bpf_getsockopt() in patch 7 to give a symmetrical user experience. selftest in patch 8 is changed to also cover bpf_getsockopt(). - Remove CAP_NET_ADMIN check in patch 7. Tracing bpf prog has already required CAP_SYS_ADMIN or CAP_PERFMON. - Move some def macros to bpf_tracing_net.h in patch 8 ==================== Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2021-07-24	bpf: selftest: Test batching and bpf_(get\|set)sockopt in bpf tcp iter	Martin KaFai Lau	5	-9/+384
	This patch adds tests for the batching and bpf_(get\|set)sockopt in bpf tcp iter. It first creates: a) 1 non SO_REUSEPORT listener in lhash2. b) 256 passive and active fds connected to the listener in (a). c) 256 SO_REUSEPORT listeners in one of the lhash2 bucket. The test sets all listeners and connections to bpf_cubic before running the bpf iter. The bpf iter then calls setsockopt(TCP_CONGESTION) to switch each listener and connection from bpf_cubic to bpf_dctcp. The bpf iter has a random_retry mode such that it can return EAGAIN to the usespace in the middle of a batch. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210701200625.1036874-1-kafai@fb.com
2021-07-24	bpf: tcp: Support bpf_(get\|set)sockopt in bpf tcp iter	Martin KaFai Lau	5	-1/+85
	This patch allows bpf tcp iter to call bpf_(get\|set)sockopt. To allow a specific bpf iter (tcp here) to call a set of helpers, get_func_proto function pointer is added to bpf_iter_reg. The bpf iter is a tracing prog which currently requires CAP_PERFMON or CAP_SYS_ADMIN, so this patch does not impose other capability checks for bpf_(get\|set)sockopt. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210701200619.1036715-1-kafai@fb.com
2021-07-24	bpf: tcp: Bpf iter batching and lock_sock	Martin KaFai Lau	1	-6/+231
	This patch does batching and lock_sock for the bpf tcp iter. It does not affect the proc fs iteration. With bpf-tcp-cc, new algo rollout happens more often. Instead of restarting the application to pick up the new tcp-cc, the next patch will allow bpf iter to do setsockopt(TCP_CONGESTION). This requires locking the sock. Also, unlike the proc iteration (cat /proc/net/tcp[6]), the bpf iter can inspect all fields of a tcp_sock. It will be useful to have a consistent view on some of the fields (e.g. the ones reported in tcp_get_info() that also acquires the sock lock). Double lock: locking the bucket first and then locking the sock could lead to deadlock. This patch takes a batching approach similar to inet_diag. While holding the bucket lock, it batch a number of sockets into an array first and then unlock the bucket. Before doing show(), it then calls lock_sock_fast(). In a machine with ~400k connections, the maximum number of sk in a bucket of the established hashtable is 7. 0.02% of the established connections fall into this bucket size. For listen hash (port+addr lhash2), the bucket is usually very small also except for the SO_REUSEPORT use case which the userspace could have one SO_REUSEPORT socket per thread. While batching is used, it can also minimize the chance of missing sock in the setsockopt use case if the whole bucket is batched. This patch will start with a batch array with INIT_BATCH_SZ (16) which will be enough for the most common cases. bpf_iter_tcp_batch() will try to realloc to a larger array to handle exception case (e.g. the SO_REUSEPORT case in the lhash2). Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210701200613.1036157-1-kafai@fb.com
2021-07-24	tcp: seq_file: Replace listening_hash with lhash2	Martin KaFai Lau	2	-17/+24
	This patch moves the tcp seq_file iteration on listeners from the port only listening_hash to the port+addr lhash2. When iterating from the bpf iter, the next patch will need to lock the socket such that the bpf iter can call setsockopt (e.g. to change the TCP_CONGESTION). To avoid locking the bucket and then locking the sock, the bpf iter will first batch some sockets from the same bucket and then unlock the bucket. If the bucket size is small (which usually is), it is easier to batch the whole bucket such that it is less likely to miss a setsockopt on a socket due to changes in the bucket. However, the port only listening_hash could have many listeners hashed to a bucket (e.g. many individual VIP(s):443 and also multiple by the number of SO_REUSEPORT). We have seen bucket size in tens of thousands range. Also, the chance of having changes in some popular port buckets (e.g. 443) is also high. The port+addr lhash2 was introduced to solve this large listener bucket issue. Also, the listening_hash usage has already been replaced with lhash2 in the fast path inet[6]_lookup_listener(). This patch follows the same direction on moving to lhash2 and iterates the lhash2 instead of listening_hash. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210701200606.1035783-1-kafai@fb.com
2021-07-24	tcp: seq_file: Add listening_get_first()	Martin KaFai Lau	1	-20/+39
	The current listening_get_next() is overloaded by passing NULL to the 2nd arg, like listening_get_next(seq, NULL), to mean get_first(). This patch moves some logic from the listening_get_next() into a new function listening_get_first(). It will be equivalent to the current established_get_first() and established_get_next() setup. get_first() is to find a non empty bucket and return the first sk. get_next() is to find the next sk of the current bucket and then resorts to get_first() if the current bucket is exhausted. The next patch is to move the listener seq_file iteration from listening_hash (port only) to lhash2 (port+addr). Separating out listening_get_first() from listening_get_next() here will make the following lhash2 changes cleaner and easier to follow. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210701200600.1035353-1-kafai@fb.com
2021-07-24	bpf: tcp: seq_file: Remove bpf_seq_afinfo from tcp_iter_state	Martin KaFai Lau	2	-21/+5
	A following patch will create a separate struct to store extra bpf_iter state and it will embed the existing tcp_iter_state like this: struct bpf_tcp_iter_state { struct tcp_iter_state state; /* More bpf_iter specific states here ... / } As a prep work, this patch removes the "struct tcp_seq_afinfo bpf_seq_afinfo" where its purpose is to tell if it is iterating from bpf_iter instead of proc fs. Currently, if "bpf_seq_afinfo" is not NULL, it is iterating from bpf_iter. The kernel should not filter by the addr family and leave this filtering decision to the bpf prog. Instead of adding a "bpf_seq_afinfo" pointer, this patch uses the "seq->op == &bpf_iter_tcp_seq_ops" test to tell if it is iterating from the bpf iter. The bpf_iter_(init\|fini)_tcp() is left here to prepare for the change of a following patch. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210701200554.1034982-1-kafai@fb.com
2021-07-24	tcp: seq_file: Refactor net and family matching	Martin KaFai Lau	1	-38/+30
	This patch refactors the net and family matching into two new helpers, seq_sk_match() and seq_file_family(). seq_file_family() is in the later part of the file to prepare the change of a following patch. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210701200548.1034629-1-kafai@fb.com
2021-07-24	tcp: seq_file: Avoid skipping sk during tcp_seek_last_pos	Martin KaFai Lau	1	-2/+3
	st->bucket stores the current bucket number. st->offset stores the offset within this bucket that is the sk to be seq_show(). Thus, st->offset only makes sense within the same st->bucket. These two variables are an optimization for the common no-lseek case. When resuming the seq_file iteration (i.e. seq_start()), tcp_seek_last_pos() tries to continue from the st->offset at bucket st->bucket. However, it is possible that the bucket pointed by st->bucket has changed and st->offset may end up skipping the whole st->bucket without finding a sk. In this case, tcp_seek_last_pos() currently continues to satisfy the offset condition in the next (and incorrect) bucket. Instead, regardless of the offset value, the first sk of the next bucket should be returned. Thus, "bucket == st->bucket" check is added to tcp_seek_last_pos(). The chance of hitting this is small and the issue is a decade old, so targeting for the next tree. Fixes: a8b690f98baf ("tcp: Fix slowness in read /proc/net/tcp") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210701200541.1033917-1-kafai@fb.com
2021-07-23	libbpf: Export bpf_program__attach_kprobe_opts function	Jiri Olsa	3	-14/+33
	Export bpf_program__attach_kprobe_opts as a public API. Rename bpf_program_attach_kprobe_opts to bpf_kprobe_opts and turn it into OPTS struct. Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Tested-by: Alan Maguire <alan.maguire@oracle.com> Link: https://lore.kernel.org/bpf/20210721215810.889975-4-jolsa@kernel.org
2021-07-23	libbpf: Allow decimal offset for kprobes	Jiri Olsa	3	-1/+14
	Allow to specify decimal offset in SEC macro, like: SEC("kprobe/bpf_fentry_test7+5") Add selftest for that. Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Tested-by: Alan Maguire <alan.maguire@oracle.com> Link: https://lore.kernel.org/bpf/20210721215810.889975-3-jolsa@kernel.org
2021-07-23	libbpf: Fix func leak in attach_kprobe	Jiri Olsa	1	-0/+1
	Add missing free() for func pointer in attach_kprobe function. Fixes: a2488b5f483f ("libbpf: Allow specification of "kprobe/function+offset"") Reported-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Tested-by: Alan Maguire <alan.maguire@oracle.com> Link: https://lore.kernel.org/bpf/20210721215810.889975-2-jolsa@kernel.org
2021-07-23	selftests/bpf: Mute expected invalid map creation error msg	Martynas Pumputis	1	-0/+5
	Previously, the newly introduced test case in test_map_in_map(), which checks whether the inner map is destroyed after unsuccessful creation of the outer map, logged the following harmless and expected error: libbpf: map 'mim': failed to create: Invalid argument(-22) libbpf: failed to load object './test_map_in_map_invalid.o' To avoid any possible confusion, mute the logging during loading of the prog. Fixes: 08f71a1e39a1 ("selftests/bpf: Check inner map deletion") Reported-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210721140941.563175-1-m@lambda.lt
2021-07-23	bpf: Remove redundant intiialization of variable stype	Colin Ian King	1	-1/+1
	The variable stype is being initialized with a value that is never read, it is being updated later on. The assignment is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210721115630.109279-1-colin.king@canonical.com
2021-07-23	bpf: Fix pointer cast warning	Arnd Bergmann	1	-1/+1
	kp->addr is a pointer, so it cannot be cast directly to a 'u64' when it gets interpreted as an integer value: kernel/trace/bpf_trace.c: In function '____bpf_get_func_ip_kprobe': kernel/trace/bpf_trace.c:968:21: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast] 968 \| return kp ? (u64) kp->addr : 0; Use the uintptr_t type instead. Fixes: 9ffd9f3ff719 ("bpf: Add bpf_get_func_ip helper for kprobe programs") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210721212007.3876595-1-arnd@kernel.org
2021-07-20	Merge branch 'libbpf: btf typed data dumping fixes (__int128 usage, error ↵	Andrii Nakryiko	2	-35/+85
	propagation)' Alan Maguire says: ==================== This series aims to resolve further issues with the BTF typed data dumping interfaces in libbpf. Compilation failures with use of __int128 on 32-bit platforms were reported [1]. As a result, the use of __int128 in libbpf typed data dumping is replaced with __u64 usage for bitfield manipulations. In the case of 128-bit integer values, they are simply split into two 64-bit hex values for display (patch 1). Tests are added for __int128 display in patch 2, using conditional compilation to avoid problems with a lack of __int128 support. Patch 3 resolves an issue Andrii noted about error propagation when handling enum data display. More followup work is required to ensure multi-dimensional char array display works correctly. [1] https://lore.kernel.org/bpf/1626362126-27775-1-git-send-email-alan.maguire@oracle.com/T/#mc2cb023acfd6c3cd0b661e385787b76bb757430d Changes since v1: - added error handling for bitfield size > 64 bits by changing function signature for bitfield retrieval to return an int error value and to set bitfield value via a __u64 * argument (Andrii) ==================== Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2021-07-20	libbpf: Propagate errors when retrieving enum value for typed data display	Alan Maguire	1	-2/+3
	When retrieving the enum value associated with typed data during "is data zero?" checking in btf_dump_type_data_check_zero(), the return value of btf_dump_get_enum_value() is not passed to the caller if the function returns a non-zero (error) value. Currently, 0 is returned if the function returns an error. We should instead propagate the error to the caller. Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/1626770993-11073-4-git-send-email-alan.maguire@oracle.com
2021-07-20	selftests/bpf: Add __int128-specific tests for typed data dump	Alan Maguire	1	-0/+17
	Add tests for __int128 display for platforms that support it. __int128s are dumped as hex values. Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/1626770993-11073-3-git-send-email-alan.maguire@oracle.com
2021-07-20	libbpf: Avoid use of __int128 in typed dump display	Alan Maguire	1	-33/+65
	__int128 is not supported for some 32-bit platforms (arm and i386). __int128 was used in carrying out computations on bitfields which aid display, but the same calculations could be done with __u64 with the small effect of not supporting 128-bit bitfields. With these changes, a big-endian issue with casting 128-bit integers to 64-bit for enum bitfields is solved also, as we now use 64-bit integers for bitfield calculations. Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/1626770993-11073-2-git-send-email-alan.maguire@oracle.com
2021-07-20	selftests, bpf: test_tc_tunnel.sh nc: Cannot use -p and -l	Vincent Li	1	-1/+1
	When run test_tc_tunnel.sh, it complains following error ipip encap 192.168.1.1 to 192.168.1.2, type ipip, mac none len 100 test basic connectivity nc: cannot use -p and -l nc man page has: -l Listen for an incoming connection rather than initiating a connection to a remote host.Cannot be used together with any of the options -psxz. Additionally, any timeouts specified with the -w option are ignored. Correct nc in server_listen(). Signed-off-by: Vincent Li <vincent.mc.li@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20210719223022.66681-1-vincent.mc.li@gmail.com