summaryrefslogtreecommitdiff
path: root/tools
AgeCommit message (Collapse)AuthorFilesLines
2019-06-19selftests/bpf: add realistic loop testsAlexei Starovoitov15-12/+1347
Add a bunch of loop tests. Most of them are created by replacing '#pragma unroll' with '#pragma clang loop unroll(disable)' Several tests are artificially large: /* partial unroll. llvm will unroll loop ~150 times. * C loop count -> 600. * Asm loop count -> 4. * 16k insns in loop body. * Total of 5 such loops. Total program size ~82k insns. */ "./pyperf600.o", /* no unroll at all. * C loop count -> 600. * ASM loop count -> 600. * ~110 insns in loop body. * Total of 5 such loops. Total program size ~1500 insns. */ "./pyperf600_nounroll.o", /* partial unroll. 19k insn in a loop. * Total program size 20.8k insn. * ~350k processed_insns */ "./strobemeta.o", Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-19selftests/bpf: add basic verifier tests for loopsAlexei Starovoitov1-0/+161
This set of tests is a rewrite of Edward's earlier tests: https://patchwork.ozlabs.org/patch/877221/ Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-19selftests/bpf: fix testsAlexei Starovoitov3-20/+24
Fix tests that assumed no loops. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-19selftests/bpf: fix tests due to const spill/fillAlexei Starovoitov2-14/+17
fix tests that incorrectly assumed that the verifier cannot track constants through stack. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-19libbpf: constify getter APIsAndrii Nakryiko2-70/+72
Add const qualifiers to bpf_object/bpf_program/bpf_map arguments for getter APIs. There is no need for them to not be const pointers. Verified that make -C tools/lib/bpf make -C tools/testing/selftests/bpf make -C tools/perf all build without warnings. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-18selftests/bpf: convert tests w/ custom values to BTF-defined mapsAndrii Nakryiko14-140/+285
Convert a bulk of selftests that have maps with custom (not integer) key and/or value. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-18selftests/bpf: switch BPF_ANNOTATE_KV_PAIR tests to BTF-defined mapsAndrii Nakryiko5-72/+87
Switch tests that already rely on BTF to BTF-defined map definitions. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-18selftests/bpf: add test for BTF-defined mapsAndrii Nakryiko2-7/+76
Add file test for BTF-defined map definition. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-18libbpf: allow specifying map definitions using BTFAndrii Nakryiko2-9/+345
This patch adds support for a new way to define BPF maps. It relies on BTF to describe mandatory and optional attributes of a map, as well as captures type information of key and value naturally. This eliminates the need for BPF_ANNOTATE_KV_PAIR hack and ensures key/value sizes are always in sync with the key/value type. Relying on BTF, this approach allows for both forward and backward compatibility w.r.t. extending supported map definition features. By default, any unrecognized attributes are treated as an error, but it's possible relax this using MAPS_RELAX_COMPAT flag. New attributes, added in the future will need to be optional. The outline of the new map definition (short, BTF-defined maps) is as follows: 1. All the maps should be defined in .maps ELF section. It's possible to have both "legacy" map definitions in `maps` sections and BTF-defined maps in .maps sections. Everything will still work transparently. 2. The map declaration and initialization is done through a global/static variable of a struct type with few mandatory and extra optional fields: - type field is mandatory and specified type of BPF map; - key/value fields are mandatory and capture key/value type/size information; - max_entries attribute is optional; if max_entries is not specified or initialized, it has to be provided in runtime through libbpf API before loading bpf_object; - map_flags is optional and if not defined, will be assumed to be 0. 3. Key/value fields should be **a pointer** to a type describing key/value. The pointee type is assumed (and will be recorded as such and used for size determination) to be a type describing key/value of the map. This is done to save excessive amounts of space allocated in corresponding ELF sections for key/value of big size. 4. As some maps disallow having BTF type ID associated with key/value, it's possible to specify key/value size explicitly without associating BTF type ID with it. Use key_size and value_size fields to do that (see example below). Here's an example of simple ARRAY map defintion: struct my_value { int x, y, z; }; struct { int type; int max_entries; int *key; struct my_value *value; } btf_map SEC(".maps") = { .type = BPF_MAP_TYPE_ARRAY, .max_entries = 16, }; This will define BPF ARRAY map 'btf_map' with 16 elements. The key will be of type int and thus key size will be 4 bytes. The value is struct my_value of size 12 bytes. This map can be used from C code exactly the same as with existing maps defined through struct bpf_map_def. Here's an example of STACKMAP definition (which currently disallows BTF type IDs for key/value): struct { __u32 type; __u32 max_entries; __u32 map_flags; __u32 key_size; __u32 value_size; } stackmap SEC(".maps") = { .type = BPF_MAP_TYPE_STACK_TRACE, .max_entries = 128, .map_flags = BPF_F_STACK_BUILD_ID, .key_size = sizeof(__u32), .value_size = PERF_MAX_STACK_DEPTH * sizeof(struct bpf_stack_build_id), }; This approach is naturally extended to support map-in-map, by making a value field to be another struct that describes inner map. This feature is not implemented yet. It's also possible to incrementally add features like pinning with full backwards and forward compatibility. Support for static initialization of BPF_MAP_TYPE_PROG_ARRAY using pointers to BPF programs is also on the roadmap. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-18libbpf: split initialization and loading of BTFAndrii Nakryiko1-10/+24
Libbpf does sanitization of BTF before loading it into kernel, if kernel doesn't support some of newer BTF features. This removes some of the important information from BTF (e.g., DATASEC and VAR description), which will be used for map construction. This patch splits BTF processing into initialization step, in which BTF is initialized from ELF and all the original data is still preserved; and sanitization/loading step, which ensures that BTF is safe to load into kernel. This allows to use full BTF information to construct maps, while still loading valid BTF into older kernels. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-18libbpf: identify maps by section index in addition to offsetAndrii Nakryiko1-15/+25
To support maps to be defined in multiple sections, it's important to identify map not just by offset within its section, but section index as well. This patch adds tracking of section index. For global data, we record section index of corresponding .data/.bss/.rodata ELF section for uniformity, and thus don't need a special value of offset for those maps. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-18libbpf: refactor map initializationAndrii Nakryiko1-114/+133
User and global data maps initialization has gotten pretty complicated and unnecessarily convoluted. This patch splits out the logic for global data map and user-defined map initialization. It also removes the restriction of pre-calculating how many maps will be initialized, instead allowing to keep adding new maps as they are discovered, which will be used later for BTF-defined map definitions. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-18libbpf: streamline ELF parsing error-handlingAndrii Nakryiko1-24/+20
Simplify ELF parsing logic by exiting early, as there is no common clean up path to execute. That makes it unnecessary to track when err was set and when it was cleared. It also reduces nesting in some places. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-18libbpf: extract BTF loading logicAndrii Nakryiko1-38/+55
As a preparation for adding BTF-based BPF map loading, extract .BTF and .BTF.ext loading logic. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-18libbpf: add common min/max macro to libbpf_internal.hAndrii Nakryiko5-15/+10
Multiple files in libbpf redefine their own definitions for min/max. Let's define them in libbpf_internal.h and use those everywhere. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-15selftests/bpf: convert socket_cookie test to sk storageStanislav Fomichev2-32/+38
This lets us test that both BPF_PROG_TYPE_CGROUP_SOCK_ADDR and BPF_PROG_TYPE_SOCK_OPS can access underlying bpf_sock. Cc: Martin Lau <kafai@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-15bpf/tools: sync bpf.hStanislav Fomichev1-0/+2
Add sk to struct bpf_sock_addr and struct bpf_sock_ops. Cc: Martin Lau <kafai@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-15bpf: Add test for SO_REUSEPORT_DETACH_BPFMartin KaFai Lau1-0/+54
This patch adds a test for the new sockopt SO_REUSEPORT_DETACH_BPF. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-15bpf: Sync asm-generic/socket.h to tools/Martin KaFai Lau1-0/+147
SO_DETACH_REUSEPORT_BPF is needed for the test in the next patch. It is defined in the socket.h. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-15libbpf: fix check for presence of associated BTF for map creationAndrii Nakryiko1-4/+5
Kernel internally checks that either key or value type ID is specified, before using btf_fd. Do the same in libbpf's map creation code for determining when to retry map creation w/o BTF. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: fba01a0689a9 ("libbpf: use negative fd to specify missing BTF") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-15selftests/bpf: signedness bug in enable_all_controllers()Dan Carpenter1-1/+1
The "len" variable needs to be signed for the error handling to work properly. Fixes: 596092ef8bea ("selftests/bpf: enable all available cgroup v2 controllers") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-11bpf: use libbpf_num_possible_cpus internallyHechao Li2-80/+10
Use the newly added bpf_num_possible_cpus() in bpftool and selftests and remove duplicate implementations. Signed-off-by: Hechao Li <hechaol@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-11selftests/bpf: remove bpf_util.h from BPF C progsHechao Li5-4/+6
Though currently there is no problem including bpf_util.h in kernel space BPF C programs, in next patch in this stack, I will reuse libbpf_num_possible_cpus() in bpf_util.h thus include libbpf.h in it, which will cause BPF C programs compile error. Therefore I will first remove bpf_util.h from all test BPF programs. This can also make it clear that bpf_util.h is a user-space utility while bpf_helpers.h is a kernel space utility. Signed-off-by: Hechao Li <hechaol@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-11bpf: add a new API libbpf_num_possible_cpus()Hechao Li3-0/+74
Adding a new API libbpf_num_possible_cpus() that helps user with per-CPU map operations. Signed-off-by: Hechao Li <hechaol@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-11selftests/bpf : clean up feature/ when make cleanHechao Li1-1/+2
An error "implicit declaration of function 'reallocarray'" can be thrown with the following steps: $ cd tools/testing/selftests/bpf $ make clean && make CC=<Path to GCC 4.8.5> $ make clean && make CC=<Path to GCC 7.x> The cause is that the feature folder generated by GCC 4.8.5 is not removed, leaving feature-reallocarray being 1, which causes reallocarray not defined when re-compliing with GCC 7.x. This diff adds feature folder to EXTRA_CLEAN to avoid this problem. v2: Rephrase the commit message. Signed-off-by: Hechao Li <hechaol@fb.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-11selftests/bpf: fix constness of source arg for bpf helpersAndrii Nakryiko1-2/+2
Fix signature of bpf_probe_read and bpf_probe_write_user to mark source pointer as const. This causes warnings during compilation for applications relying on those helpers. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-06-11libbpf: remove qidconf and better support external bpf programs.Jonathan Lemon1-75/+28
Use the recent change to XSKMAP bpf_map_lookup_elem() to test if there is a xsk present in the map instead of duplicating the work with qidconf. Fix things so callers using XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD bypass any internal bpf maps, so xsk_socket__{create|delete} works properly. Clean up error handling path. Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Acked-by: Song Liu <songliubraving@fb.com> Tested-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-06-11tools/bpf: Add bpf_map_lookup_elem selftest for xskmapJonathan Lemon1-0/+18
Check that bpf_map_lookup_elem lookup and structure access operats correctly. Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-06-11bpf/tools: sync bpf.hJonathan Lemon1-0/+4
Sync uapi/linux/bpf.h Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-06-11bpf: Allow bpf_map_lookup_elem() on an xskmapJonathan Lemon1-15/+0
Currently, the AF_XDP code uses a separate map in order to determine if an xsk is bound to a queue. Instead of doing this, have bpf_map_lookup_elem() return a xdp_sock. Rearrange some xdp_sock members to eliminate structure holes. Remove selftest - will be added back in later patch. Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-06-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller68-490/+5496
Alexei Starovoitov says: ==================== pull-request: bpf-next 2019-05-31 The following pull-request contains BPF updates for your *net-next* tree. Lots of exciting new features in the first PR of this developement cycle! The main changes are: 1) misc verifier improvements, from Alexei. 2) bpftool can now convert btf to valid C, from Andrii. 3) verifier can insert explicit ZEXT insn when requested by 32-bit JITs. This feature greatly improves BPF speed on 32-bit architectures. From Jiong. 4) cgroups will now auto-detach bpf programs. This fixes issue of thousands bpf programs got stuck in dying cgroups. From Roman. 5) new bpf_send_signal() helper, from Yonghong. 6) cgroup inet skb programs can signal CN to the stack, from Lawrence. 7) miscellaneous cleanups, from many developers. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-01selftests/bpf: measure RTT from xdp using xdpingAlan Maguire6-2/+558
xdping allows us to get latency estimates from XDP. Output looks like this: ./xdping -I eth4 192.168.55.8 Setting up XDP for eth4, please wait... XDP setup disrupts network connectivity, hit Ctrl+C to quit Normal ping RTT data [Ignore final RTT; it is distorted by XDP using the reply] PING 192.168.55.8 (192.168.55.8) from 192.168.55.7 eth4: 56(84) bytes of data. 64 bytes from 192.168.55.8: icmp_seq=1 ttl=64 time=0.302 ms 64 bytes from 192.168.55.8: icmp_seq=2 ttl=64 time=0.208 ms 64 bytes from 192.168.55.8: icmp_seq=3 ttl=64 time=0.163 ms 64 bytes from 192.168.55.8: icmp_seq=8 ttl=64 time=0.275 ms 4 packets transmitted, 4 received, 0% packet loss, time 3079ms rtt min/avg/max/mdev = 0.163/0.237/0.302/0.054 ms XDP RTT data: 64 bytes from 192.168.55.8: icmp_seq=5 ttl=64 time=0.02808 ms 64 bytes from 192.168.55.8: icmp_seq=6 ttl=64 time=0.02804 ms 64 bytes from 192.168.55.8: icmp_seq=7 ttl=64 time=0.02815 ms 64 bytes from 192.168.55.8: icmp_seq=8 ttl=64 time=0.02805 ms The xdping program loads the associated xdping_kern.o BPF program and attaches it to the specified interface. If run in client mode (the default), it will add a map entry keyed by the target IP address; this map will store RTT measurements, current sequence number etc. Finally in client mode the ping command is executed, and the xdping BPF program will use the last ICMP reply, reformulate it as an ICMP request with the next sequence number and XDP_TX it. After the reply to that request is received we can measure RTT and repeat until the desired number of measurements is made. This is why the sequence numbers in the normal ping are 1, 2, 3 and 8. We XDP_TX a modified version of ICMP reply 4 and keep doing this until we get the 4 replies we need; hence the networking stack only sees reply 8, where we have XDP_PASSed it upstream since we are done. In server mode (-s), xdping simply takes ICMP requests and replies to them in XDP rather than passing the request up to the networking stack. No map entry is required. xdping can be run in native XDP mode (the default, or specified via -N) or in skb mode (-S). A test program test_xdping.sh exercises some of these options. Note that native XDP does not seem to XDP_TX for veths, hence -N is not tested. Looking at the code, it looks like XDP_TX is supported so I'm not sure if that's expected. Running xdping in native mode for ixgbe as both client and server works fine. Changes since v4 - close fds on cleanup (Song Liu) Changes since v3 - fixed seq to be __be16 (Song Liu) - fixed fd checks in xdping.c (Song Liu) Changes since v2 - updated commit message to explain why seq number of last ICMP reply is 8 not 4 (Song Liu) - updated types of seq number, raddr and eliminated csum variable in xdpclient/xdpserver functions as it was not needed (Song Liu) - added XDPING_DEFAULT_COUNT definition and usage specification of default/max counts (Song Liu) Changes since v1 - moved from RFC to PATCH - removed unused variable in ipv4_csum() (Song Liu) - refactored ICMP checks into icmp_check() function called by client and server programs and reworked client and server programs due to lack of shared code (Song Liu) - added checks to ensure that SKB and native mode are not requested together (Song Liu) Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-05-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller57-286/+634
The phylink conflict was between a bug fix by Russell King to make sure we have a consistent PHY interface mode, and a change in net-next to pull some code in phylink_resolve() into the helper functions phylink_mac_link_{up,down}() On the dp83867 side it's mostly overlapping changes, with the 'net' side removing a condition that was supposed to trigger for RGMII but because of how it was coded never actually could trigger. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds5-9/+371
Pull networking fixes from David Miller: 1) Fix OOPS during nf_tables rule dump, from Florian Westphal. 2) Use after free in ip_vs_in, from Yue Haibing. 3) Fix various kTLS bugs (NULL deref during device removal resync, netdev notification ignoring, etc.) From Jakub Kicinski. 4) Fix ipv6 redirects with VRF, from David Ahern. 5) Memory leak fix in igmpv3_del_delrec(), from Eric Dumazet. 6) Missing memory allocation failure check in ip6_ra_control(), from Gen Zhang. And likewise fix ip_ra_control(). 7) TX clean budget logic error in aquantia, from Igor Russkikh. 8) SKB leak in llc_build_and_send_ui_pkt(), from Eric Dumazet. 9) Double frees in mlx5, from Parav Pandit. 10) Fix lost MAC address in r8169 during PCI D3, from Heiner Kallweit. 11) Fix botched register access in mvpp2, from Antoine Tenart. 12) Use after free in napi_gro_frags(), from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (89 commits) net: correct zerocopy refcnt with udp MSG_MORE ethtool: Check for vlan etype or vlan tci when parsing flow_rule net: don't clear sock->sk early to avoid trouble in strparser net-gro: fix use-after-free read in napi_gro_frags() net: dsa: tag_8021q: Create a stable binary format net: dsa: tag_8021q: Change order of rx_vid setup net: mvpp2: fix bad MVPP2_TXQ_SCHED_TOKEN_CNTR_REG queue value ipv4: tcp_input: fix stack out of bounds when parsing TCP options. mlxsw: spectrum: Prevent force of 56G mlxsw: spectrum_acl: Avoid warning after identical rules insertion net: dsa: mv88e6xxx: fix handling of upper half of STATS_TYPE_PORT r8169: fix MAC address being lost in PCI D3 net: core: support XDP generic on stacked devices. netvsc: unshare skb in VF rx handler udp: Avoid post-GRO UDP checksum recalculation net: phy: dp83867: Set up RGMII TX delay net: phy: dp83867: do not call config_init twice net: phy: dp83867: increase SGMII autoneg timer duration net: phy: dp83867: fix speed 10 in sgmii mode net: phy: marvell10g: report if the PHY fails to boot firmware ...
2019-05-30selftests/net: add TFO key rotation selftestJason Baron4-0/+394
Demonstrate how the primary and backup TFO keys can be rotated while minimizing the number of client cookies that are rejected. Signed-off-by: Jason Baron <jbaron@akamai.com> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30net: sched: Introduce act_ctinfo actionKevin 'ldir' Darbyshire-Bryant1-0/+1
ctinfo is a new tc filter action module. It is designed to restore information contained in firewall conntrack marks to other packet fields and is typically used on packet ingress paths. At present it has two independent sub-functions or operating modes, DSCP restoration mode & skb mark restoration mode. The DSCP restore mode: This mode copies DSCP values that have been placed in the firewall conntrack mark back into the IPv4/v6 diffserv fields of relevant packets. The DSCP restoration is intended for use and has been found useful for restoring ingress classifications based on egress classifications across links that bleach or otherwise change DSCP, typically home ISP Internet links. Restoring DSCP on ingress on the WAN link allows qdiscs such as but by no means limited to CAKE to shape inbound packets according to policies that are easier to set & mark on egress. Ingress classification is traditionally a challenging task since iptables rules haven't yet run and tc filter/eBPF programs are pre-NAT lookups, hence are unable to see internal IPv4 addresses as used on the typical home masquerading gateway. Thus marking the connection in some manner on egress for later restoration of classification on ingress is easier to implement. Parameters related to DSCP restore mode: dscpmask - a 32 bit mask of 6 contiguous bits and indicate bits of the conntrack mark field contain the DSCP value to be restored. statemask - a 32 bit mask of (usually) 1 bit length, outside the area specified by dscpmask. This represents a conditional operation flag whereby the DSCP is only restored if the flag is set. This is useful to implement a 'one shot' iptables based classification where the 'complicated' iptables rules are only run once to classify the connection on initial (egress) packet and subsequent packets are all marked/restored with the same DSCP. A mask of zero disables the conditional behaviour ie. the conntrack mark DSCP bits are always restored to the ip diffserv field (assuming the conntrack entry is found & the skb is an ipv4/ipv6 type) e.g. dscpmask 0xfc000000 statemask 0x01000000 |----0xFC----conntrack mark----000000---| | Bits 31-26 | bit 25 | bit24 |~~~ Bit 0| | DSCP | unused | flag |unused | |-----------------------0x01---000000---| | | | | ---| Conditional flag v only restore if set |-ip diffserv-| | 6 bits | |-------------| The skb mark restore mode (cpmark): This mode copies the firewall conntrack mark to the skb's mark field. It is completely the functional equivalent of the existing act_connmark action with the additional feature of being able to apply a mask to the restored value. Parameters related to skb mark restore mode: mask - a 32 bit mask applied to the firewall conntrack mark to mask out bits unwanted for restoration. This can be useful where the conntrack mark is being used for different purposes by different applications. If not specified and by default the whole mark field is copied (i.e. default mask of 0xffffffff) e.g. mask 0x00ffffff to mask out the top 8 bits being used by the aforementioned DSCP restore mode. |----0x00----conntrack mark----ffffff---| | Bits 31-24 | | | DSCP & flag| some value here | |---------------------------------------| | | v |------------skb mark-------------------| | | | | zeroed | | |---------------------------------------| Overall parameters: zone - conntrack zone control - action related control (reclassify | pipe | drop | continue | ok | goto chain <CHAIN_INDEX>) Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30libbpf: reduce unnecessary line wrappingAndrii Nakryiko1-36/+16
There are a bunch of lines of code or comments that are unnecessary wrapped into multi-lines. Fix that without violating any code guidelines. Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-05-30libbpf: typo and formatting fixesAndrii Nakryiko1-8/+7
A bunch of typo and formatting fixes. Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-05-30libbpf: simplify two pieces of logicAndrii Nakryiko1-3/+1
Extra check for type is unnecessary in first case. Extra zeroing is unnecessary, as snprintf guarantees that it will zero-terminate string. Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-05-30libbpf: use negative fd to specify missing BTFAndrii Nakryiko1-6/+7
0 is a valid FD, so it's better to initialize it to -1, as is done in other places. Also, technically, BTF type ID 0 is valid (it's a VOID type), so it's more reliable to check btf_fd, instead of btf_key_type_id, to determine if there is any BTF associated with a map. Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-05-30libbpf: fix error code returned on corrupted ELFAndrii Nakryiko1-1/+1
All of libbpf errors are negative, except this one. Fix it. Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-05-30libbpf: check map name retrieved from ELFAndrii Nakryiko1-0/+5
Validate there was no error retrieving symbol name corresponding to a BPF map. Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-05-30libbpf: simplify endianness checkAndrii Nakryiko1-25/+12
Rewrite endianness check to use "more canonical" way, using compiler-defined macros, similar to few other places in libbpf. It also is more obvious and shorter. Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-05-30libbpf: preserve errno before calling into user callbackAndrii Nakryiko1-4/+4
pr_warning ultimately may call into user-provided callback function, which can clobber errno value, so we need to save it before that. Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-05-30libbpf: fix detection of corrupted BPF instructions sectionAndrii Nakryiko1-5/+7
Ensure that size of a section w/ BPF instruction is exactly a multiple of BPF instruction size. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-05-29Merge tag 'linux-kselftest-5.2-rc3' of ↵Linus Torvalds5-9/+38
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull Kselftest fixes from Shuah Khan: - Alexandre Belloni's fixes to rtc regressions introduced in kselftest Makefile test run output refactoring work from Kees Cook. - ftrace test checkbashisms fixes from Masami Hiramatsu * tag 'linux-kselftest-5.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests: rtc: rtctest: specify timeouts selftests/harness: Allow test to configure timeout selftests/ftrace: Add checkbashisms meta-testcase selftests/ftrace: Make a script checkbashisms clean
2019-05-29libbpf: prevent overwriting of log_level in bpf_object__load_progs()Quentin Monnet1-1/+1
There are two functions in libbpf that support passing a log_level parameter for the verifier for loading programs: bpf_object__load_xattr() and bpf_prog_load_xattr(). Both accept an attribute object containing the log_level, and apply it to the programs to load. It turns out that to effectively load the programs, the latter function eventually relies on the former. This was not taken into account when adding support for log_level in bpf_object__load_xattr(), and the log_level passed to bpf_prog_load_xattr() later gets overwritten with a zero value, thus disabling verifier logs for the program in all cases: bpf_prog_load_xattr() // prog->log_level = attr1->log_level; -> bpf_object__load() // attr2->log_level = 0; -> bpf_object__load_xattr() // <pass prog and attr2> -> bpf_object__load_progs() // prog->log_level = attr2->log_level; Fix this by OR-ing the log_level in bpf_object__load_progs(), instead of overwriting it. v2: Fix commit log description (confusion on function names in v1). Fixes: 60276f984998 ("libbpf: add bpf_object__load_xattr() API function to pass log_level") Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-05-29selftests/bpf: fix compilation error for flow_dissector.cAlan Maguire1-0/+114
When building the tools/testing/selftest/bpf subdirectory, (running both a local directory "make" and a "make -C tools/testing/selftests/bpf") I keep hitting the following compilation error: prog_tests/flow_dissector.c: In function ‘create_tap’: prog_tests/flow_dissector.c:150:38: error: ‘IFF_NAPI’ undeclared (first use in this function) .ifr_flags = IFF_TAP | IFF_NO_PI | IFF_NAPI | IFF_NAPI_FRAGS, ^ prog_tests/flow_dissector.c:150:38: note: each undeclared identifier is reported only once for each function it appears in prog_tests/flow_dissector.c:150:49: error: ‘IFF_NAPI_FRAGS’ undeclared Adding include/uapi/linux/if_tun.h to tools/include/uapi/linux resolves the problem and ensures the compilation of the file does not depend on having up-to-date kernel headers locally. Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-05-29selftests/net: ipv6 flowlabelWillem de Bruijn5-2/+453
Test the IPv6 flowlabel control and datapath interfaces: Acquire and release the right to use flowlabels with socket option IPV6_FLOWLABEL_MGR. Then configure flowlabels on send and read them on recv with cmsg IPV6_FLOWINFO. Also verify auto-flowlabel if not explicitly set. This helped identify the issue fixed in commit 95c169251bf73 ("ipv6: invert flowlabel sharing check in process and user mode") Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-29selftests: pmtu: Fix encapsulating device in pmtu_vti6_link_change_mtuStefano Brivio1-7/+7
In the pmtu_vti6_link_change_mtu test, both local and remote addresses for the vti6 tunnel are assigned to the same address given to the dummy interface that we use as encapsulating device with a known MTU. This works as long as the dummy interface is actually selected, via rt6_lookup(), as encapsulating device. But if the remote address of the tunnel is a local address too, the loopback interface could also be selected, and there's nothing wrong with it. This is what some older -stable kernels do (3.18.z, at least), and nothing prevents us from subtly changing FIB implementation to revert back to that behaviour in the future. Define an IPv6 prefix instead, and use two separate addresses as local and remote for vti6, so that the encapsulating device can't be a loopback interface. Reported-by: Xiumei Mu <xmu@redhat.com> Fixes: 1fad59ea1c34 ("selftests: pmtu: Add pmtu_vti6_link_change_mtu test") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>