summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2019-08-14Merge branch 'bpf-libbpf-read-sysfs-btf'Daniel Borkmann4-32/+82
Andrii Nakryiko says: ==================== Now that kernel's BTF is exposed through sysfs at well-known location, attempt to load it first as a target BTF for the purpose of BPF CO-RE relocations. Patch #1 is a follow-up patch to rename /sys/kernel/btf/kernel into /sys/kernel/btf/vmlinux. Patch #2 adds ability to load raw BTF contents from sysfs and expands the list of locations libbpf attempts to load vmlinux BTF from. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-14libbpf: attempt to load kernel BTF from sysfs firstAndrii Nakryiko1-7/+57
Add support for loading kernel BTF from sysfs (/sys/kernel/btf/vmlinux) as a target BTF. Also extend the list of on disk search paths for vmlinux ELF image with entries that perf is searching for. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-14btf: rename /sys/kernel/btf/kernel into /sys/kernel/btf/vmlinuxAndrii Nakryiko3-25/+25
Expose kernel's BTF under the name vmlinux to be more uniform with using kernel module names as file names in the future. Fixes: 341dfcf8d78e ("btf: expose BTF info through sysfs") Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-13selftests/bpf: fix race in flow dissector testsPetar Penkov2-8/+27
Since the "last_dissection" map holds only the flow keys for the most recent packet, there is a small race in the skb-less flow dissector tests if a new packet comes between transmitting the test packet, and reading its keys from the map. If this happens, the test packet keys will be overwritten and the test will fail. Changing the "last_dissection" map to a hash map, keyed on the source/dest port pair resolves this issue. Additionally, let's clear the last test results from the map between tests to prevent previous test cases from interfering with the following test cases. Fixes: 0905beec9f52 ("selftests/bpf: run flow dissector tests in skb-less mode") Signed-off-by: Petar Penkov <ppenkov@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-13tools: bpftool: add feature check for zlibPeter Wu1-3/+8
bpftool requires libelf, and zlib for decompressing /proc/config.gz. zlib is a transitive dependency via libelf, and became mandatory since elfutils 0.165 (Jan 2016). The feature check of libelf is already done in the elfdep target of tools/lib/bpf/Makefile, pulled in by bpftool via a dependency on libbpf.a. Add a similar feature check for zlib. Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Peter Wu <peter@lekensteyn.nl> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-13btf: expose BTF info through sysfsAndrii Nakryiko4-19/+104
Make .BTF section allocated and expose its contents through sysfs. /sys/kernel/btf directory is created to contain all the BTFs present inside kernel. Currently there is only kernel's main BTF, represented as /sys/kernel/btf/kernel file. Once kernel modules' BTFs are supported, each module will expose its BTF as /sys/kernel/btf/<module-name> file. Current approach relies on a few pieces coming together: 1. pahole is used to take almost final vmlinux image (modulo .BTF and kallsyms) and generate .BTF section by converting DWARF info into BTF. This section is not allocated and not mapped to any segment, though, so is not yet accessible from inside kernel at runtime. 2. objcopy dumps .BTF contents into binary file and subsequently convert binary file into linkable object file with automatically generated symbols _binary__btf_kernel_bin_start and _binary__btf_kernel_bin_end, pointing to start and end, respectively, of BTF raw data. 3. final vmlinux image is generated by linking this object file (and kallsyms, if necessary). sysfs_btf.c then creates /sys/kernel/btf/kernel file and exposes embedded BTF contents through it. This allows, e.g., libbpf and bpftool access BTF info at well-known location, without resorting to searching for vmlinux image on disk (location of which is not standardized and vmlinux image might not be even available in some scenarios, e.g., inside qemu during testing). Alternative approach using .incbin assembler directive to embed BTF contents directly was attempted but didn't work, because sysfs_proc.o is not re-compiled during link-vmlinux.sh stage. This is required, though, to update embedded BTF data (initially empty data is embedded, then pahole generates BTF info and we need to regenerate sysfs_btf.o with updated contents, but it's too late at that point). If BTF couldn't be generated due to missing or too old pahole, sysfs_btf.c handles that gracefully by detecting that _binary__btf_kernel_bin_start (weak symbol) is 0 and not creating /sys/kernel/btf at all. v2->v3: - added Documentation/ABI/testing/sysfs-kernel-btf (Greg K-H); - created proper kobject (btf_kobj) for btf directory (Greg K-H); - undo v2 change of reusing vmlinux, as it causes extra kallsyms pass due to initially missing __binary__btf_kernel_bin_{start/end} symbols; v1->v2: - allow kallsyms stage to re-use vmlinux generated by gen_btf(); Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-12tools: bpftool: fix reading from /proc/config.gzPeter Wu2-53/+54
/proc/config has never existed as far as I can see, but /proc/config.gz is present on Arch Linux. Add support for decompressing config.gz using zlib which is a mandatory dependency of libelf anyway. Replace existing stdio functions with gzFile operations since the latter transparently handles uncompressed and gzip-compressed files. Cc: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-09Merge branch 'bpf-xdp-fwd-sample-improvements'Daniel Borkmann2-21/+53
Jesper Dangaard Brouer says: ==================== V3: Hopefully fixed all issues point out by Yonghong Song V2: Addressed issues point out by Yonghong Song - Please ACK patch 2/3 again - Added ACKs and reviewed-by to other patches This patchset is focused on improvements for XDP forwarding sample named xdp_fwd, which leverage the existing FIB routing tables as described in LPC2018[1] talk by David Ahern. The primary motivation is to illustrate how Toke's recent work improves usability of XDP_REDIRECT via lookups in devmap. The other patches are to help users understand the sample. I have more improvements to xdp_fwd, but those might requires changes to libbpf. Thus, sending these patches first as they are isolated. [1] http://vger.kernel.org/lpc-networking2018.html#session-1 ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-09samples/bpf: xdp_fwd explain bpf_fib_lookup return codesJesper Dangaard Brouer1-2/+17
Make it clear that this XDP program depend on the network stack to do the ARP resolution. This is connected with the BPF_FIB_LKUP_RET_NO_NEIGH return code from bpf_fib_lookup(). Another common mistake (seen via XDP-tutorial) is that users don't realize that sysctl net.ipv{4,6}.conf.all.forwarding setting is honored by bpf_fib_lookup. Reported-by: Anton Protopopov <a.s.protopopov@gmail.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Acked-by: Yonghong Song <yhs@fb.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-09samples/bpf: make xdp_fwd more practically usable via devmap lookupJesper Dangaard Brouer2-17/+33
This address the TODO in samples/bpf/xdp_fwd_kern.c, which points out that the chosen egress index should be checked for existence in the devmap. This can now be done via taking advantage of Toke's work in commit 0cdbb4b09a06 ("devmap: Allow map lookups from eBPF"). This change makes xdp_fwd more practically usable, as this allows for a mixed environment, where IP-forwarding fallback to network stack, if the egress device isn't configured to use XDP. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-09samples/bpf: xdp_fwd rename devmap name to be xdp_tx_portsJesper Dangaard Brouer2-3/+4
The devmap name 'tx_port' came from a copy-paste from xdp_redirect_map which only have a single TX port. Change name to xdp_tx_ports to make it more descriptive. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Acked-by: Yonghong Song <yhs@fb.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-09xdp: xdp_umem: fix umem pages mapping for 32bits systemsIvan Khoronzhuk1-1/+11
Use kmap instead of page_address as it's not always in low memory. Acked-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-08tools/bpf: fix core_reloc.c compilation errorYonghong Song1-1/+1
On my local machine, I have the following compilation errors: ===== In file included from prog_tests/core_reloc.c:3:0: ./progs/core_reloc_types.h:517:46: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘fancy_char_ptr_t’ typedef const char * const volatile restrict fancy_char_ptr_t; ^ ./progs/core_reloc_types.h:527:2: error: unknown type name ‘fancy_char_ptr_t’ fancy_char_ptr_t d; ^ ===== I am using gcc 4.8.5. Later compilers may change their behavior not emitting the error. Nevertheless, let us fix the issue. "restrict" can be tested without typedef. Fixes: 9654e2ae908e ("selftests/bpf: add CO-RE relocs modifiers/typedef tests") Cc: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-08Merge branch 'compile-once-run-everywhere'Alexei Starovoitov62-281/+2981
Andrii Nakryiko says: ==================== This patch set implements central part of CO-RE (Compile Once - Run Everywhere, see [0] and [1] for slides and video): relocating fields offsets. Most of the details are written down as comments to corresponding parts of the code. Patch #1 adds a bunch of commonly useful btf_xxx helpers to simplify working with BTF types. Patch #2 converts existing libbpf code to these new helpers and removes some of pre-existing ones. Patch #3 adds loading of .BTF.ext offset relocations section and macros to work with its contents. Patch #4 implements CO-RE relocations algorithm in libbpf. Patch #5 introduced BPF_CORE_READ macro, hiding usage of Clang's __builtin_preserve_access_index intrinsic that records offset relocation. Patches #6-#14 adds selftests validating various parts of relocation handling, type compatibility, etc. For all tests to work, you'll need latest Clang/LLVM supporting __builtin_preserve_access_index intrinsic, used for recording offset relocations. Kernel on which selftests run should have BTF information built in (CONFIG_DEBUG_INFO_BTF=y). [0] http://vger.kernel.org/bpfconf2019.html#session-2 [1] http://vger.kernel.org/lpc-bpf2018.html#session-2 v5->v6: - fix bad comment formatting for real (Alexei); v4->v5: - drop constness for btf_xxx() helpers, allowing to avoid type casts (Alexei); - rebase on latest bpf-next, change test__printf back to printf; v3->v4: - added btf_xxx helpers (Alexei); - switched libbpf code to new helpers; - reduced amount of logging and simplified format in few places (Alexei); - made flavor name parsing logic more strict (exactly three underscores); - no uname() error checking (Alexei); - updated misc tests to reflect latest Clang fixes (Yonghong); v2->v3: - enclose BPF_CORE_READ args in parens (Song); v1->v2: - add offsetofend(), fix btf_ext optional fields checks (Song); - add bpf_core_dump_spec() for logging spec representation; - move special first element processing out of the loop (Song); - typo fixes (Song); - drop BPF_ST | BPF_MEM insn relocation (Alexei); - extracted BPF_CORE_READ into bpf_helpers (Alexei); - added extra tests validating Clang capturing relocs correctly (Yonghong); - switch core_relocs.c to use sub-tests; - updated mods tests after Clang bug was fixed (Yonghong); - fix bug enumerating candidate types; ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-08selftests/bpf: add CO-RE relocs misc testsAndrii Nakryiko4-0/+106
Add tests validating few edge-cases of capturing offset relocations. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-08selftests/bpf: add CO-RE relocs ints testsAndrii Nakryiko11-0/+209
Add various tests validating handling compatible/incompatible integer types. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-08selftests/bpf: add CO-RE relocs ptr-as-array testsAndrii Nakryiko5-0/+69
Add test validating correct relocation handling for cases where pointer to something is used as an array. E.g.: int *ptr = ...; int x = ptr[42]; Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-08selftests/bpf: add CO-RE relocs modifiers/typedef testsAndrii Nakryiko6-0/+170
Add tests validating correct handling of various combinations of typedefs and const/volatile/restrict modifiers. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-08selftests/bpf: add CO-RE relocs enum/ptr/func_proto testsAndrii Nakryiko10-0/+167
Test CO-RE relocation handling of ints, enums, pointers, func protos, etc. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-08selftests/bpf: add CO-RE relocs array testsAndrii Nakryiko11-0/+201
Add tests for various array handling/relocation scenarios. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-08selftests/bpf: add CO-RE relocs nesting testsAndrii Nakryiko16-0/+421
Add a bunch of test validating correct handling of nested structs/unions. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-08selftests/bpf: add CO-RE relocs struct flavors testsAndrii Nakryiko5-0/+117
Add tests verifying that BPF program can use various struct/union "flavors" to extract data from the same target struct/union. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-08selftests/bpf: add CO-RE relocs testing setupAndrii Nakryiko2-0/+165
Add CO-RE relocation test runner. Add one simple test validating that libbpf's logic for searching for kernel image and loading BTF out of it works. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-08selftests/bpf: add BPF_CORE_READ relocatable read macroAndrii Nakryiko1-0/+20
Add BPF_CORE_READ macro used in tests to do bpf_core_read(), which automatically captures offset relocation. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-08libbpf: implement BPF CO-RE offset relocation algorithmAndrii Nakryiko2-18/+864
This patch implements the core logic for BPF CO-RE offsets relocations. Every instruction that needs to be relocated has corresponding bpf_offset_reloc as part of BTF.ext. Relocations are performed by trying to match recorded "local" relocation spec against potentially many compatible "target" types, creating corresponding spec. Details of the algorithm are noted in corresponding comments in the code. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-08libbpf: add .BTF.ext offset relocation section loadingAndrii Nakryiko3-42/+136
Add support for BPF CO-RE offset relocations. Add section/record iteration macros for .BTF.ext. These macro are useful for iterating over each .BTF.ext record, either for dumping out contents or later for BPF CO-RE relocation handling. To enable other parts of libbpf to work with .BTF.ext contents, moved a bunch of type definitions into libbpf_internal.h. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-08libbpf: convert libbpf code to use new btf helpersAndrii Nakryiko3-221/+158
Simplify code by relying on newly added BTF helper functions. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-08libbpf: add helpers for working with BTF typesAndrii Nakryiko1-0/+178
Add lots of frequently used helpers that simplify working with BTF types. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-07Merge branch 'test_progs-stdio'Alexei Starovoitov10-94/+84
Stanislav Fomichev says: ==================== I was looking into converting test_sockops* to test_progs framework and that requires using cgroup_helpers.c which rely on stdio/stderr. Let's use open_memstream to override stdout into buffer during subtests instead of custom test_{v,}printf wrappers. That lets us continue to use stdio in the subtests and dump it on failure if required. That would also fix bpf_find_map which currently uses printf to signal failure (missed during test_printf conversion). ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-07selftests/bpf: test_progs: drop extra trailing tabStanislav Fomichev1-1/+1
Small (un)related cleanup. Cc: Andrii Nakryiko <andriin@fb.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-07selftests/bpf: test_progs: test__printf -> printfStanislav Fomichev10-38/+22
Now that test__printf is a simple wraper around printf, let's drop it (and test__vprintf as well). Cc: Andrii Nakryiko <andriin@fb.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-07selftests/bpf: test_progs: switch to open_memstreamStanislav Fomichev2-56/+62
Use open_memstream to override stdout during test execution. The copy of the original stdout is held in env.stdout and used to print subtest info and dump failed log. test_{v,}printf are now simple wrappers around stdout and will be removed in the next patch. v5: * fix -v crash by always setting env.std{in,err} (Alexei Starovoitov) * drop force_log check from stdio_hijack (Andrii Nakryiko) v4: * one field per line for stdout/stderr (Andrii Nakryiko) v3: * don't do strlen over log_buf, log_cnt has it already (Andrii Nakryiko) v2: * add ifdef __GLIBC__ around open_memstream (maybe pointless since we already depend on glibc for argp_parse) * hijack stderr as well (Andrii Nakryiko) * don't hijack for every test, do it once (Andrii Nakryiko) * log_cap -> log_size (Andrii Nakryiko) * do fseeko in a proper place (Andrii Nakryiko) * check open_memstream returned value (Andrii Nakryiko) Cc: Andrii Nakryiko <andriin@fb.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-06selftests/bpf: add loop test 5Alexei Starovoitov2-0/+33
Add a test with multiple exit conditions. It's not an infinite loop only when the verifier can properly track all math on variable 'i' through all possible ways of executing this loop. barrier()s are needed to disable llvm optimization that combines multiple branches into fewer branches. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com>
2019-08-06selftests/bpf: add loop test 4Alexei Starovoitov2-0/+19
Add a test that returns a 'random' number between [0, 2^20) If state pruning is not working correctly for loop body the number of processed insns will be 2^20 * num_of_insns_in_loop_body and the program will be rejected. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Yonghong Song <yhs@fb.com>
2019-08-01Merge branch 'setsockopt-extra-mem'Alexei Starovoitov3-4/+60
Stanislav Fomichev says: ==================== Current setsockopt hook is limited to the size of the buffer that user had supplied. Since we always allocate memory and copy the value into kernel space, allocate just a little bit more in case BPF program needs to override input data with a larger value. The canonical example is TCP_CONGESTION socket option where input buffer is a string and if user calls it with a short string, BPF program has no way of extending it. The tests are extended with TCP_CONGESTION use case. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-01selftests/bpf: extend sockopt_sk selftest with TCP_CONGESTION use caseStanislav Fomichev2-0/+47
Ignore SOL_TCP:TCP_CONGESTION in getsockopt and always override SOL_TCP:TCP_CONGESTION with "cubic" in setsockopt hook. Call setsockopt(SOL_TCP, TCP_CONGESTION) with short optval ("nv") to make sure BPF program has enough buffer space to replace it with "cubic". Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-01bpf: always allocate at least 16 bytes for setsockopt hookStanislav Fomichev1-4/+13
Since we always allocate memory, allocate just a little bit more for the BPF program in case it need to override user input with bigger value. The canonical example is TCP_CONGESTION where input string might be too small to override (nv -> bbr or cubic). 16 bytes are chosen to match the size of TCP_CA_NAME_MAX and can be extended in the future if needed. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-07-31tools: bpftool: add support for reporting the effective cgroup progsJakub Kicinski3-38/+76
Takshak said in the original submission: With different bpf attach_flags available to attach bpf programs specially with BPF_F_ALLOW_OVERRIDE and BPF_F_ALLOW_MULTI, the list of effective bpf-programs available to any sub-cgroups really needs to be available for easy debugging. Using BPF_F_QUERY_EFFECTIVE flag, one can get the list of not only attached bpf-programs to a cgroup but also the inherited ones from parent cgroup. So a new option is introduced to use BPF_F_QUERY_EFFECTIVE query flag here to list all the effective bpf-programs available for execution at a specified cgroup. Reused modified test program test_cgroup_attach from tools/testing/selftests/bpf: # ./test_cgroup_attach With old bpftool: # bpftool cgroup show /sys/fs/cgroup/cgroup-test-work-dir/cg1/ ID AttachType AttachFlags Name 271 egress multi pkt_cntr_1 272 egress multi pkt_cntr_2 Attached new program pkt_cntr_4 in cg2 gives following: # bpftool cgroup show /sys/fs/cgroup/cgroup-test-work-dir/cg1/cg2 ID AttachType AttachFlags Name 273 egress override pkt_cntr_4 And with new "effective" option it shows all effective programs for cg2: # bpftool cgroup show /sys/fs/cgroup/cgroup-test-work-dir/cg1/cg2 effective ID AttachType AttachFlags Name 273 egress override pkt_cntr_4 271 egress override pkt_cntr_1 272 egress override pkt_cntr_2 Compared to original submission use a local flag instead of global option. We need to clear query_flags on every command, in case batch mode wants to use varying settings. v2: (Takshak) - forbid duplicated flags; - fix cgroup path freeing. Signed-off-by: Takshak Chahande <ctakshak@fb.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Takshak Chahande <ctakshak@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-07-31selftests/bpf: fix clearing buffered output between tests/subtestsAndrii Nakryiko1-1/+1
Clear buffered output once test or subtests finishes even if test was successful. Not doing this leads to accumulation of output from previous tests and on first failed tests lots of irrelevant output will be dumped, greatly confusing things. v1->v2: fix Fixes tag, add more context to patch Fixes: 3a516a0a3a7b ("selftests/bpf: add sub-tests support for test_progs") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-07-31Merge branch 'gen-syn-cookie'Alexei Starovoitov11-22/+354
Petar Penkov says: ==================== This patch series introduces a BPF helper function that allows generating SYN cookies from BPF. Currently, this helper is enabled at both the TC hook and the XDP hook. The first two patches in the series add/modify several TCP helper functions to allow for SKB-less operation, as is the case at the XDP hook. The third patch introduces the bpf_tcp_gen_syncookie helper function which generates a SYN cookie for either XDP or TC programs. The return value of this function contains both the MSS value, encoded in the cookie, and the cookie itself. The last three patches sync tools/ and add a test. Performance evaluation: I sent 10Mpps to a fixed port on a host with 2 10G bonded Mellanox 4 NICs from random IPv6 source addresses. Without XDP I observed 7.2Mpps (syn-acks) being sent out if the IPv6 packets carry 20 bytes of TCP options or 7.6Mpps if they carry no options. If I attached a simple program that checks if a packet is IPv6/TCP/SYN, looks up the socket, issues a cookie, and sends it back out after swapping src/dest, recomputing the checksum, and setting the ACK flag, I observed 10Mpps being sent back out. Changes since v1: 1/ Added performance numbers to the cover letter 2/ Patch 2: Refactored a bit to fix compilation issues 3/ Patch 3: Changed ENOTSUPP to EOPNOTSUPP at Toke's suggestion Changes since RFC: 1/ Cookie is returned in host order at Alexei's suggestion 2/ If cookies are not enabled via a sysctl, the helper function returns -ENOENT instead of -EINVAL at Lorenz's suggestion 3/ Fixed documentation to properly reflect that MSS is 16 bits at Lorenz's suggestion 4/ BPF helper requires TCP length to match ->doff field, rather than to simply be no more than 20 bytes at Eric and Alexei's suggestion 5/ Packet type is looked up from the packet version field, rather than from the socket. v4 packets are rejected on v6-only sockets but should work with dual stack listeners at Eric's suggestion 6/ Removed unnecessary `net` argument from helper function in patch 2 at Lorenz's suggestion 7/ Changed test to only pass MSS option so we can convince the verifier that the memory access is not out of bounds Note that 7/ below illustrates the verifier might need to be extended to allow passing a variable tcph->doff to the helper function like below: __u32 thlen = tcph->doff * 4; if (thlen < sizeof(*tcph)) return; __s64 cookie = bpf_tcp_gen_syncookie(sk, ipv4h, 20, tcph, thlen); ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-07-31selftests/bpf: add test for bpf_tcp_gen_syncookiePetar Penkov3-13/+99
Modify the existing bpf_tcp_check_syncookie test to also generate a SYN cookie, pass the packet to the kernel, and verify that the two cookies are the same (and both valid). Since cloned SKBs are skipped during generic XDP, this test does not issue a SYN cookie when run in XDP mode. We therefore only check that a valid SYN cookie was issued at the TC hook. Additionally, verify that the MSS for that SYN cookie is within expected range. Signed-off-by: Petar Penkov <ppenkov@google.com> Reviewed-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-07-31selftests/bpf: bpf_tcp_gen_syncookie->bpf_helpersPetar Penkov1-0/+3
Expose bpf_tcp_gen_syncookie to selftests. Signed-off-by: Petar Penkov <ppenkov@google.com> Reviewed-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-07-31bpf: sync bpf.h to tools/Petar Penkov1-3/+34
Sync updated documentation for bpf_redirect_map. Sync the bpf_tcp_gen_syncookie helper function definition with the one in tools/uapi. Signed-off-by: Petar Penkov <ppenkov@google.com> Reviewed-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-07-31bpf: add bpf_tcp_gen_syncookie helperPetar Penkov2-1/+102
This helper function allows BPF programs to try to generate SYN cookies, given a reference to a listener socket. The function works from XDP and with an skb context since bpf_skc_lookup_tcp can lookup a socket in both cases. Signed-off-by: Petar Penkov <ppenkov@google.com> Suggested-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-07-31tcp: add skb-less helpers to retrieve SYN cookiePetar Penkov4-0/+113
This patch allows generation of a SYN cookie before an SKB has been allocated, as is the case at XDP. Signed-off-by: Petar Penkov <ppenkov@google.com> Reviewed-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-07-31tcp: tcp_syn_flood_action read port from socketPetar Penkov1-5/+3
This allows us to call this function before an SKB has been allocated. Signed-off-by: Petar Penkov <ppenkov@google.com> Reviewed-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-07-29Merge branch 'devmap_hash'Alexei Starovoitov13-65/+321
Toke Høiland-Jørgensen says: ==================== This series adds a new map type, devmap_hash, that works like the existing devmap type, but using a hash-based indexing scheme. This is useful for the use case where a devmap is indexed by ifindex (for instance for use with the routing table lookup helper). For this use case, the regular devmap needs to be sized after the maximum ifindex number, not the number of devices in it. A hash-based indexing scheme makes it possible to size the map after the number of devices it should contain instead. This was previously part of my patch series that also turned the regular bpf_redirect() helper into a map-based one; for this series I just pulled out the patches that introduced the new map type. Changelog: v5: - Dynamically set the number of hash buckets by rounding up max_entries to the nearest power of two (mirroring the regular hashmap), as suggested by Jesper. v4: - Remove check_memlock parameter that was left over from an earlier patch series. - Reorder struct members to avoid holes. v3: - Rework the split into different patches - Use spin_lock_irqsave() - Also add documentation and bash completion definitions for bpftool v2: - Split commit adding the new map type so uapi and tools changes are separate. Changes to these patches since the previous series: - Rebase on top of the other devmap changes (makes this one simpler!) - Don't enforce key==val, but allow arbitrary indexes. - Rename the type to devmap_hash to reflect the fact that it's just a hashmap now. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-07-29tools: Add definitions for devmap_hash map typeToke Høiland-Jørgensen4-4/+21
This adds selftest and bpftool updates for the devmap_hash map type. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-07-29tools/libbpf_probes: Add new devmap_hash typeToke Høiland-Jørgensen1-0/+1
This adds the definition for BPF_MAP_TYPE_DEVMAP_HASH to libbpf_probes.c in tools/lib/bpf. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-07-29tools/include/uapi: Add devmap_hash BPF map typeToke Høiland-Jørgensen1-0/+1
This adds the devmap_hash BPF map type to the uapi headers in tools/. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>