summaryrefslogtreecommitdiff
path: root/tools
AgeCommit message (Collapse)AuthorFilesLines
2020-03-30bpf: Add socket assign supportJoe Stringer1-1/+24
Add support for TPROXY via a new bpf helper, bpf_sk_assign(). This helper requires the BPF program to discover the socket via a call to bpf_sk*_lookup_*(), then pass this socket to the new helper. The helper takes its own reference to the socket in addition to any existing reference that may or may not currently be obtained for the duration of BPF processing. For the destination socket to receive the traffic, the traffic must be routed towards that socket via local route. The simplest example route is below, but in practice you may want to route traffic more narrowly (eg by CIDR): $ ip route add local default dev lo This patch avoids trying to introduce an extra bit into the skb->sk, as that would require more invasive changes to all code interacting with the socket to ensure that the bit is handled correctly, such as all error-handling cases along the path from the helper in BPF through to the orphan path in the input. Instead, we opt to use the destructor variable to switch on the prefetch of the socket. Signed-off-by: Joe Stringer <joe@wand.net.nz> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200329225342.16317-2-joe@wand.net.nz
2020-03-30bpf: lsm: Add selftests for BPF_PROG_TYPE_LSMKP Singh3-0/+136
* Load/attach a BPF program that hooks to file_mprotect (int) and bprm_committed_creds (void). * Perform an action that triggers the hook. * Verify if the audit event was received using the shared global variables for the process executed. * Verify if the mprotect returns a -EPERM. Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Brendan Jackman <jackmanb@google.com> Reviewed-by: Florent Revest <revest@google.com> Reviewed-by: Thomas Garnier <thgarnie@google.com> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200329004356.27286-8-kpsingh@chromium.org
2020-03-30tools/libbpf: Add support for BPF_PROG_TYPE_LSMKP Singh4-5/+44
Since BPF_PROG_TYPE_LSM uses the same attaching mechanism as BPF_PROG_TYPE_TRACING, the common logic is refactored into a static function bpf_program__attach_btf_id. A new API call bpf_program__attach_lsm is still added to avoid userspace conflicts if this ever changes in the future. Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Brendan Jackman <jackmanb@google.com> Reviewed-by: Florent Revest <revest@google.com> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200329004356.27286-7-kpsingh@chromium.org
2020-03-30bpf: Introduce BPF_PROG_TYPE_LSMKP Singh2-0/+3
Introduce types and configs for bpf programs that can be attached to LSM hooks. The programs can be enabled by the config option CONFIG_BPF_LSM. Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Brendan Jackman <jackmanb@google.com> Reviewed-by: Florent Revest <revest@google.com> Reviewed-by: Thomas Garnier <thgarnie@google.com> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: James Morris <jamorris@linux.microsoft.com> Link: https://lore.kernel.org/bpf/20200329004356.27286-2-kpsingh@chromium.org
2020-03-30selftests: Add test for overriding global data value before loadToke Høiland-Jørgensen2-1/+62
This adds a test to exercise the new bpf_map__set_initial_value() function. The test simply overrides the global data section with all zeroes, and checks that the new value makes it into the kernel map on load. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200329132253.232541-2-toke@redhat.com
2020-03-30libbpf: Add setter for initial value for internal mapsToke Høiland-Jørgensen3-0/+14
For internal maps (most notably the maps backing global variables), libbpf uses an internal mmaped area to store the data after opening the object. This data is subsequently copied into the kernel map when the object is loaded. This adds a function to set a new value for that data, which can be used to before it is loaded into the kernel. This is especially relevant for RODATA maps, since those are frozen on load. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200329132253.232541-1-toke@redhat.com
2020-03-29selftests/bpf: Add tests for attaching XDP programsToke Høiland-Jørgensen1-0/+62
This adds tests for the various replacement operations using IFLA_XDP_EXPECTED_FD. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/158515700967.92963.15098921624731968356.stgit@toke.dk
2020-03-29libbpf: Add function to set link XDP fd while specifying old programToke Høiland-Jørgensen3-1/+42
This adds a new function to set the XDP fd while specifying the FD of the program to replace, using the newly added IFLA_XDP_EXPECTED_FD netlink parameter. The new function uses the opts struct mechanism to be extendable in the future. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/158515700857.92963.7052131201257841700.stgit@toke.dk
2020-03-29tools: Add EXPECTED_FD-related definitions in if_link.hToke Høiland-Jørgensen1-1/+3
This adds the IFLA_XDP_EXPECTED_FD netlink attribute definition and the XDP_FLAGS_REPLACE flag to if_link.h in tools/include. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/158515700747.92963.8615391897417388586.stgit@toke.dk
2020-03-28libbpf, xsk: Init all ring members in xsk_umem__create and xsk_socket__createFletcher Dunn1-2/+14
Fix a sharp edge in xsk_umem__create and xsk_socket__create. Almost all of the members of the ring buffer structs are initialized, but the "cached_xxx" variables are not all initialized. The caller is required to zero them. This is needlessly dangerous. The results if you don't do it can be very bad. For example, they can cause xsk_prod_nb_free and xsk_cons_nb_avail to return values greater than the size of the queue. xsk_ring_cons__peek can return an index that does not refer to an item that has been queued. I have confirmed that without this change, my program misbehaves unless I memset the ring buffers to zero before calling the function. Afterwards, my program works without (or with) the memset. Signed-off-by: Fletcher Dunn <fletcherd@valvesoftware.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/85f12913cde94b19bfcb598344701c38@valvesoftware.com
2020-03-28bpf: Add selftest cases for ctx_or_null argument typeDaniel Borkmann1-0/+105
Add various tests to make sure the verifier keeps catching them: # ./test_verifier [...] #230/p pass ctx or null check, 1: ctx OK #231/p pass ctx or null check, 2: null OK #232/p pass ctx or null check, 3: 1 OK #233/p pass ctx or null check, 4: ctx - const OK #234/p pass ctx or null check, 5: null (connect) OK #235/p pass ctx or null check, 6: null (bind) OK #236/p pass ctx or null check, 7: ctx (bind) OK #237/p pass ctx or null check, 8: null (bind) OK [...] Summary: 1595 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/c74758d07b1b678036465ef7f068a49e9efd3548.1585323121.git.daniel@iogearbox.net
2020-03-28bpf: Enable bpf cgroup hooks to retrieve cgroup v2 and ancestor idDaniel Borkmann1-1/+20
Enable the bpf_get_current_cgroup_id() helper for connect(), sendmsg(), recvmsg() and bind-related hooks in order to retrieve the cgroup v2 context which can then be used as part of the key for BPF map lookups, for example. Given these hooks operate in process context 'current' is always valid and pointing to the app that is performing mentioned syscalls if it's subject to a v2 cgroup. Also with same motivation of commit 7723628101aa ("bpf: Introduce bpf_skb_ancestor_cgroup_id helper") enable retrieval of ancestor from current so the cgroup id can be used for policy lookups which can then forbid connect() / bind(), for example. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/d2a7ef42530ad299e3cbb245e6c12374b72145ef.1585323121.git.daniel@iogearbox.net
2020-03-28bpf: Add netns cookie and enable it for bpf cgroup hooksDaniel Borkmann1-1/+15
In Cilium we're mainly using BPF cgroup hooks today in order to implement kube-proxy free Kubernetes service translation for ClusterIP, NodePort (*), ExternalIP, and LoadBalancer as well as HostPort mapping [0] for all traffic between Cilium managed nodes. While this works in its current shape and avoids packet-level NAT for inter Cilium managed node traffic, there is one major limitation we're facing today, that is, lack of netns awareness. In Kubernetes, the concept of Pods (which hold one or multiple containers) has been built around network namespaces, so while we can use the global scope of attaching to root BPF cgroup hooks also to our advantage (e.g. for exposing NodePort ports on loopback addresses), we also have the need to differentiate between initial network namespaces and non-initial one. For example, ExternalIP services mandate that non-local service IPs are not to be translated from the host (initial) network namespace as one example. Right now, we have an ugly work-around in place where non-local service IPs for ExternalIP services are not xlated from connect() and friends BPF hooks but instead via less efficient packet-level NAT on the veth tc ingress hook for Pod traffic. On top of determining whether we're in initial or non-initial network namespace we also have a need for a socket-cookie like mechanism for network namespaces scope. Socket cookies have the nice property that they can be combined as part of the key structure e.g. for BPF LRU maps without having to worry that the cookie could be recycled. We are planning to use this for our sessionAffinity implementation for services. Therefore, add a new bpf_get_netns_cookie() helper which would resolve both use cases at once: bpf_get_netns_cookie(NULL) would provide the cookie for the initial network namespace while passing the context instead of NULL would provide the cookie from the application's network namespace. We're using a hole, so no size increase; the assignment happens only once. Therefore this allows for a comparison on initial namespace as well as regular cookie usage as we have today with socket cookies. We could later on enable this helper for other program types as well as we would see need. (*) Both externalTrafficPolicy={Local|Cluster} types [0] https://github.com/cilium/cilium/blob/master/bpf/bpf_sock.c Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/c47d2346982693a9cf9da0e12690453aded4c788.1585323121.git.daniel@iogearbox.net
2020-03-26bpf: Test_verifier, #70 error message updates for 32-bit right shiftJohn Fastabend1-4/+2
After changes to add update_reg_bounds after ALU ops and adding ALU32 bounds tracking the error message is changed in the 32-bit right shift tests. Test "#70/u bounds check after 32-bit right shift with 64-bit input FAIL" now fails with, Unexpected error message! EXP: R0 invalid mem access RES: func#0 @0 7: (b7) r1 = 2 8: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=invP2 R10=fp0 fp-8_w=mmmmmmmm 8: (67) r1 <<= 31 9: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=invP4294967296 R10=fp0 fp-8_w=mmmmmmmm 9: (74) w1 >>= 31 10: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=invP0 R10=fp0 fp-8_w=mmmmmmmm 10: (14) w1 -= 2 11: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=invP4294967294 R10=fp0 fp-8_w=mmmmmmmm 11: (0f) r0 += r1 math between map_value pointer and 4294967294 is not allowed And test "#70/p bounds check after 32-bit right shift with 64-bit input FAIL" now fails with, Unexpected error message! EXP: R0 invalid mem access RES: func#0 @0 7: (b7) r1 = 2 8: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=inv2 R10=fp0 fp-8_w=mmmmmmmm 8: (67) r1 <<= 31 9: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=inv4294967296 R10=fp0 fp-8_w=mmmmmmmm 9: (74) w1 >>= 31 10: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=inv0 R10=fp0 fp-8_w=mmmmmmmm 10: (14) w1 -= 2 11: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=inv4294967294 R10=fp0 fp-8_w=mmmmmmmm 11: (0f) r0 += r1 last_idx 11 first_idx 0 regs=2 stack=0 before 10: (14) w1 -= 2 regs=2 stack=0 before 9: (74) w1 >>= 31 regs=2 stack=0 before 8: (67) r1 <<= 31 regs=2 stack=0 before 7: (b7) r1 = 2 math between map_value pointer and 4294967294 is not allowed Before this series we did not trip the "math between map_value pointer..." error because check_reg_sane_offset is never called in adjust_ptr_min_max_vals(). Instead we have a register state that looks like this at line 11*, 11: R0_w=map_value(id=0,off=0,ks=8,vs=8, smin_value=0,smax_value=0, umin_value=0,umax_value=0, var_off=(0x0; 0x0)) R1_w=invP(id=0, smin_value=0,smax_value=4294967295, umin_value=0,umax_value=4294967295, var_off=(0xfffffffe; 0x0)) R10=fp(id=0,off=0, smin_value=0,smax_value=0, umin_value=0,umax_value=0, var_off=(0x0; 0x0)) fp-8_w=mmmmmmmm 11: (0f) r0 += r1 In R1 'smin_val != smax_val' yet we have a tnum_const as seen by 'var_off(0xfffffffe; 0x0))' with a 0x0 mask. So we hit this check in adjust_ptr_min_max_vals() if ((known && (smin_val != smax_val || umin_val != umax_val)) || smin_val > smax_val || umin_val > umax_val) { /* Taint dst register if offset had invalid bounds derived from * e.g. dead branches. */ __mark_reg_unknown(env, dst_reg); return 0; } So we don't throw an error here and instead only throw an error later in the verification when the memory access is made. The root cause in verifier without alu32 bounds tracking is having 'umin_value = 0' and 'umax_value = U64_MAX' from BPF_SUB which we set when 'umin_value < umax_val' here, if (dst_reg->umin_value < umax_val) { /* Overflow possible, we know nothing */ dst_reg->umin_value = 0; dst_reg->umax_value = U64_MAX; } else { ...} Later in adjust_calar_min_max_vals we previously did a coerce_reg_to_size() which will clamp the U64_MAX to U32_MAX by truncating to 32bits. But either way without a call to update_reg_bounds the less precise bounds tracking will fall out of the alu op verification. After latest changes we now exit adjust_scalar_min_max_vals with the more precise umin value, due to zero extension propogating bounds from alu32 bounds into alu64 bounds and then calling update_reg_bounds. This then causes the verifier to trigger an earlier error and we get the error in the output above. This patch updates tests to reflect new error message. * I have a local patch to print entire verifier state regardless if we believe it is a constant so we can get a full picture of the state. Usually if tnum_is_const() then bounds are also smin=smax, etc. but this is not always true and is a bit subtle. Being able to see these states helps understand dataflow imo. Let me know if we want something similar upstream. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/158507161475.15666.3061518385241144063.stgit@john-Precision-5820-Tower
2020-03-26libbpf: Don't allocate 16M for log buffer by defaultStanislav Fomichev2-13/+29
For each prog/btf load we allocate and free 16 megs of verifier buffer. On production systems it doesn't really make sense because the programs/btf have gone through extensive testing and (mostly) guaranteed to successfully load. Let's assume successful case by default and skip buffer allocation on the first try. If there is an error, start with BPF_LOG_BUF_SIZE and double it on each ENOSPC iteration. v3: * Return -ENOMEM when can't allocate log buffer (Andrii Nakryiko) v2: * Don't allocate the buffer at all on the first try (Andrii Nakryiko) Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200325195521.112210-1-sdf@google.com
2020-03-26libbpf: Remove unused parameter `def` to get_map_field_intTobias Klauser1-10/+6
Has been unused since commit ef99b02b23ef ("libbpf: capture value in BTF type info for BTF-defined map defs"). Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200325113655.19341-1-tklauser@distanz.ch
2020-03-24samples, bpf: Move read_trace_pipe to trace_helpersDaniel T. Lee2-0/+24
To reduce the reliance of trace samples (trace*_user) on bpf_load, move read_trace_pipe to trace_helpers. By moving this bpf_loader helper elsewhere, trace functions can be easily migrated to libbbpf. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200321100424.1593964-2-danieltimlee@gmail.com
2020-03-23bpf: Add tests for bpf_sk_storage to bpf_tcp_caMartin KaFai Lau2-8/+47
This patch adds test to exercise the bpf_sk_storage_get() and bpf_sk_storage_delete() helper from the bpf_dctcp.c. The setup and check on the sk_storage is done immediately before and after the connect(). This patch also takes this chance to move the pthread_create() after the connect() has been done. That will remove the need of the "wait_thread" label. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200320152107.2169904-1-kafai@fb.com
2020-03-20selftests/bpf: Fix mix of tabs and spacesBill Wendling1-1/+1
Clang's -Wmisleading-indentation warns about misleading indentations if there's a mixture of spaces and tabs. Remove extraneous spaces. Signed-off-by: Bill Wendling <morbo@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200320201510.217169-1-morbo@google.com
2020-03-20bpftool: Add struct_ops supportMartin KaFai Lau5-1/+743
This patch adds struct_ops support to the bpftool. To recap a bit on the recent bpf_struct_ops feature on the kernel side: It currently supports "struct tcp_congestion_ops" to be implemented in bpf. At a high level, bpf_struct_ops is struct_ops map populated with a number of bpf progs. bpf_struct_ops currently supports the "struct tcp_congestion_ops". However, the bpf_struct_ops design is generic enough that other kernel struct ops can be supported in the future. Although struct_ops is map+progs at a high lever, there are differences in details. For example, 1) After registering a struct_ops, the struct_ops is held by the kernel subsystem (e.g. tcp-cc). Thus, there is no need to pin a struct_ops map or its progs in order to keep them around. 2) To iterate all struct_ops in a system, it iterates all maps in type BPF_MAP_TYPE_STRUCT_OPS. BPF_MAP_TYPE_STRUCT_OPS is the current usual filter. In the future, it may need to filter by other struct_ops specific properties. e.g. filter by tcp_congestion_ops or other kernel subsystem ops in the future. 3) struct_ops requires the running kernel having BTF info. That allows more flexibility in handling other kernel structs. e.g. it can always dump the latest bpf_map_info. 4) Also, "struct_ops" command is not intended to repeat all features already provided by "map" or "prog". For example, if there really is a need to pin the struct_ops map, the user can use the "map" cmd to do that. While the first attempt was to reuse parts from map/prog.c, it ended up not a lot to share. The only obvious item is the map_parse_fds() but that still requires modifications to accommodate struct_ops map specific filtering (for the immediate and the future needs). Together with the earlier mentioned differences, it is better to part away from map/prog.c. The initial set of subcmds are, register, unregister, show, and dump. For register, it registers all struct_ops maps that can be found in an obj file. Option can be added in the future to specify a particular struct_ops map. Also, the common bpf_tcp_cc is stateless (e.g. bpf_cubic.c and bpf_dctcp.c). The "reuse map" feature is not implemented in this patch and it can be considered later also. For other subcmds, please see the man doc for details. A sample output of dump: [root@arch-fb-vm1 bpf]# bpftool struct_ops dump name cubic [{ "bpf_map_info": { "type": 26, "id": 64, "key_size": 4, "value_size": 256, "max_entries": 1, "map_flags": 0, "name": "cubic", "ifindex": 0, "btf_vmlinux_value_type_id": 18452, "netns_dev": 0, "netns_ino": 0, "btf_id": 52, "btf_key_type_id": 0, "btf_value_type_id": 0 } },{ "bpf_struct_ops_tcp_congestion_ops": { "refcnt": { "refs": { "counter": 1 } }, "state": "BPF_STRUCT_OPS_STATE_INUSE", "data": { "list": { "next": 0, "prev": 0 }, "key": 0, "flags": 0, "init": "void (struct sock *) bictcp_init/prog_id:138", "release": "void (struct sock *) 0", "ssthresh": "u32 (struct sock *) bictcp_recalc_ssthresh/prog_id:141", "cong_avoid": "void (struct sock *, u32, u32) bictcp_cong_avoid/prog_id:140", "set_state": "void (struct sock *, u8) bictcp_state/prog_id:142", "cwnd_event": "void (struct sock *, enum tcp_ca_event) bictcp_cwnd_event/prog_id:139", "in_ack_event": "void (struct sock *, u32) 0", "undo_cwnd": "u32 (struct sock *) tcp_reno_undo_cwnd/prog_id:144", "pkts_acked": "void (struct sock *, const struct ack_sample *) bictcp_acked/prog_id:143", "min_tso_segs": "u32 (struct sock *) 0", "sndbuf_expand": "u32 (struct sock *) 0", "cong_control": "void (struct sock *, const struct rate_sample *) 0", "get_info": "size_t (struct sock *, u32, int *, union tcp_cc_info *) 0", "name": "bpf_cubic", "owner": 0 } } } ] Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20200318171656.129650-1-kafai@fb.com
2020-03-20bpftool: Translate prog_id to its bpf prog_nameMartin KaFai Lau2-12/+107
The kernel struct_ops obj has kernel's func ptrs implemented by bpf_progs. The bpf prog_id is stored as the value of the func ptr for introspection purpose. In the latter patch, a struct_ops dump subcmd will be added to introspect these func ptrs. It is desired to print the actual bpf prog_name instead of only printing the prog_id. Since struct_ops is the only usecase storing prog_id in the func ptr, this patch adds a prog_id_as_func_ptr bool (default is false) to "struct btf_dumper" in order not to mis-interpret the ptr value for the other existing use-cases. While printing a func_ptr as a bpf prog_name, this patch also prefix the bpf prog_name with the ptr's func_proto. [ Note that it is the ptr's func_proto instead of the bpf prog's func_proto ] It reuses the current btf_dump_func() to obtain the ptr's func_proto string. Here is an example from the bpf_cubic.c: "void (struct sock *, u32, u32) bictcp_cong_avoid/prog_id:140" Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20200318171650.129252-1-kafai@fb.com
2020-03-20bpftool: Print as a string for char arrayMartin KaFai Lau1-0/+41
A char[] is currently printed as an integer array. This patch will print it as a string when 1) The array element type is an one byte int 2) The array element type has a BTF_INT_CHAR encoding or the array element type's name is "char" 3) All characters is between (0x1f, 0x7f) and it is terminated by a null character. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20200318171643.129021-1-kafai@fb.com
2020-03-20bpftool: Print the enum's name instead of valueMartin KaFai Lau1-4/+36
This patch prints the enum's name if there is one found in the array of btf_enum. The commit 9eea98497951 ("bpf: fix BTF verification of enums") has details about an enum could have any power-of-2 size (up to 8 bytes). This patch also takes this chance to accommodate these non 4 byte enums. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20200318171637.128862-1-kafai@fb.com
2020-03-17bpf, libbpf: Fix ___bpf_kretprobe_args1(x) macro definitionWenbo Zhang1-1/+1
Use PT_REGS_RC instead of PT_REGS_RET to get ret correctly. Fixes: df8ff35311c8 ("libbpf: Merge selftests' bpf_trace_helpers.h into libbpf's bpf_tracing.h") Signed-off-by: Wenbo Zhang <ethercflow@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200315083252.22274-1-ethercflow@gmail.com
2020-03-17selftests/bpf: Reset process and thread affinity after each test/sub-testAndrii Nakryiko2-1/+42
Some tests and sub-tests are setting "custom" thread/process affinity and don't reset it back. Instead of requiring each test to undo all this, ensure that thread affinity is restored by test_progs test runner itself. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200314013932.4035712-3-andriin@fb.com
2020-03-17selftests/bpf: Fix test_progs's parsing of test numbersAndrii Nakryiko1-6/+7
When specifying disjoint set of tests, test_progs doesn't set skipped test's array elements to false. This leads to spurious execution of tests that should have been skipped. Fix it by explicitly initializing them to false. Fixes: 3a516a0a3a7b ("selftests/bpf: add sub-tests support for test_progs") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200314013932.4035712-2-andriin@fb.com
2020-03-17selftests/bpf: Fix race in tcp_rtt testAndrii Nakryiko1-2/+2
Previous attempt to make tcp_rtt more robust introduced a new race, in which server_done might be set to true before server can actually accept any connection. Fix this by unconditionally waiting for accept(). Given socket is non-blocking, if there are any problems with client side, it should eventually close listening FD and let server thread exit with failure. Fixes: 4cd729fa022c ("selftests/bpf: Make tcp_rtt test more robust to failures") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200314013932.4035712-1-andriin@fb.com
2020-03-17selftests/bpf: Fix nanosleep for real this timeAndrii Nakryiko2-11/+7
Amazingly, some libc implementations don't call __NR_nanosleep syscall from their nanosleep() APIs. Hammer it down with explicit syscall() call and never get back to it again. Also simplify code for timespec initialization. I verified that nanosleep is called w/ printk and in exactly same Linux image that is used in Travis CI. So it should both sleep and call correct syscall. v1->v2: - math is too hard, fix usec -> nsec convertion (Martin); - test_vmlinux has explicit nanosleep() call, convert that one as well. Fixes: 4e1fd25d19e8 ("selftests/bpf: Fix usleep() implementation") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200314002743.3782677-1-andriin@fb.com
2020-03-17selftest/bpf: Fix compilation warning in sockmap_parse_prog.cAndrii Nakryiko1-1/+0
Remove unused len variable, which causes compilation warnings. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200314001834.3727680-1-andriin@fb.com
2020-03-15selftests: mlxsw: RED: Test RED ECN nodrop offloadPetr Machata3-8/+61
Extend RED testsuite to cover the new nodrop mode of RED-ECN. This test is really similar to ECN test, diverging only in the last step, where UDP traffic should go to backlog instead of being dropped. Thus extract a common helper, ecn_test_common(), make do_ecn_test() into a relatively simple wrapper, and add another one, do_ecn_nodrop_test(). Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-15selftests: qdiscs: RED: Add nodrop testsPetr Machata1-0/+68
Add tests for the new "nodrop" flag. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-15selftests: qdiscs: Add TDC test for REDPetr Machata1-0/+117
Add a handful of tests for creating RED with different flags. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Roman Mashak <mrv@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller65-472/+2380
Daniel Borkmann says: ==================== pull-request: bpf-next 2020-03-13 The following pull-request contains BPF updates for your *net-next* tree. We've added 86 non-merge commits during the last 12 day(s) which contain a total of 107 files changed, 5771 insertions(+), 1700 deletions(-). The main changes are: 1) Add modify_return attach type which allows to attach to a function via BPF trampoline and is run after the fentry and before the fexit programs and can pass a return code to the original caller, from KP Singh. 2) Generalize BPF's kallsyms handling and add BPF trampoline and dispatcher objects to be visible in /proc/kallsyms so they can be annotated in stack traces, from Jiri Olsa. 3) Extend BPF sockmap to allow for UDP next to existing TCP support in order in order to enable this for BPF based socket dispatch, from Lorenz Bauer. 4) Introduce a new bpftool 'prog profile' command which attaches to existing BPF programs via fentry and fexit hooks and reads out hardware counters during that period, from Song Liu. Example usage: bpftool prog profile id 337 duration 3 cycles instructions llc_misses 4228 run_cnt 3403698 cycles (84.08%) 3525294 instructions # 1.04 insn per cycle (84.05%) 13 llc_misses # 3.69 LLC misses per million isns (83.50%) 5) Batch of improvements to libbpf, bpftool and BPF selftests. Also addition of a new bpf_link abstraction to keep in particular BPF tracing programs attached even when the applicaion owning them exits, from Andrii Nakryiko. 6) New bpf_get_current_pid_tgid() helper for tracing to perform PID filtering and which returns the PID as seen by the init namespace, from Carlos Neira. 7) Refactor of RISC-V JIT code to move out common pieces and addition of a new RV32G BPF JIT compiler, from Luke Nelson. 8) Add gso_size context member to __sk_buff in order to be able to know whether a given skb is GSO or not, from Willem de Bruijn. 9) Add a new bpf_xdp_output() helper which reuses XDP's existing perf RB output implementation but can be called from tracepoint programs, from Eelco Chaudron. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-14selftests/bpf: Add vmlinux.h selftest exercising tracing of syscallsAndrii Nakryiko3-1/+133
Add vmlinux.h generation to selftest/bpf's Makefile. Use it from newly added test_vmlinux to trace nanosleep syscall using 5 different types of programs: - tracepoint; - raw tracepoint; - raw tracepoint w/ direct memory reads (tp_btf); - kprobe; - fentry. These programs are realistic variants of real-life tracing programs, excercising vmlinux.h's usage with tracing applications. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200313172336.1879637-5-andriin@fb.com
2020-03-14libbpf: Provide CO-RE variants of PT_REGS macrosAndrii Nakryiko1-0/+103
Syscall raw tracepoints have struct pt_regs pointer as tracepoint's first argument. After that, reading any of pt_regs fields requires bpf_probe_read(), even for tp_btf programs. Due to that, PT_REGS_PARMx macros are not usable as is. This patch adds CO-RE variants of those macros that use BPF_CORE_READ() to read necessary fields. This provides relocatable architecture-agnostic pt_regs field accesses. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200313172336.1879637-4-andriin@fb.com
2020-03-14libbpf: Ignore incompatible types with matching name during CO-RE relocationAndrii Nakryiko1-0/+4
When finding target type candidates, ignore forward declarations, functions, and other named types of incompatible kind. Not doing this can cause false errors. See [0] for one such case (due to struct pt_regs forward declaration). [0] https://github.com/iovisor/bcc/pull/2806#issuecomment-598543645 Fixes: ddc7c3042614 ("libbpf: implement BPF CO-RE offset relocation algorithm") Reported-by: Wenbo Zhang <ethercflow@gmail.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200313172336.1879637-3-andriin@fb.com
2020-03-14selftests/bpf: Ensure consistent test failure outputAndrii Nakryiko2-9/+9
printf() doesn't seem to honor using overwritten stdout/stderr (as part of stdio hijacking), so ensure all "standard" invocations of printf() do fprintf(stdout, ...) instead. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200313172336.1879637-2-andriin@fb.com
2020-03-13selftests/bpf: Fix spurious failures in accept due to EAGAINJakub Sitnicki1-19/+58
Andrii Nakryiko reports that sockmap_listen test suite is frequently failing due to accept() calls erroring out with EAGAIN: ./test_progs:connect_accept_thread:733: accept: Resource temporarily unavailable connect_accept_thread:FAIL:733 This is because we are using a non-blocking listening TCP socket to accept() connections without polling on the socket. While at first switching to blocking mode seems like the right thing to do, this could lead to test process blocking indefinitely in face of a network issue, like loopback interface being down, as Andrii pointed out. Hence, stick to non-blocking mode for TCP listening sockets but with polling for incoming connection for a limited time before giving up. Apply this approach to all socket I/O calls in the test suite that we expect to block indefinitely, that is accept() for TCP and recv() for UDP. Fixes: 44d28be2b8d4 ("selftests/bpf: Tests for sockmap/sockhash holding listening sockets") Reported-by: Andrii Nakryiko <andrii.nakryiko@gmail.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200313161049.677700-1-jakub@cloudflare.com
2020-03-13tools/bpf: Move linux/types.h for selftests and bpftoolTobias Klauser3-4/+4
Commit fe4eb069edb7 ("bpftool: Use linux/types.h from source tree for profiler build") added a build dependency on tools/testing/selftests/bpf to tools/bpf/bpftool. This is suboptimal with respect to a possible stand-alone build of bpftool. Fix this by moving tools/testing/selftests/bpf/include/uapi/linux/types.h to tools/include/uapi/linux/types.h. This requires an adjustment in the include search path order for the tests in tools/testing/selftests/bpf so that tools/include/linux/types.h is selected when building host binaries and tools/include/uapi/linux/types.h is selected when building bpf binaries. Verified by compiling bpftool and the bpf selftests on x86_64 with this change. Fixes: fe4eb069edb7 ("bpftool: Use linux/types.h from source tree for profiler build") Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com> Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20200313113105.6918-1-tklauser@distanz.ch
2020-03-13selftests/bpf: Fix usleep() implementationAndrii Nakryiko1-1/+10
nanosleep syscall expects pointer to struct timespec, not nanoseconds directly. Current implementation fulfills its purpose of invoking nanosleep syscall, but doesn't really provide sleeping capabilities, which can cause flakiness for tests relying on usleep() to wait for something. Fixes: ec12a57b822c ("selftests/bpf: Guarantee that useep() calls nanosleep() syscall") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200313061837.3685572-1-andriin@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-03-13selftests/bpf: Make tcp_rtt test more robust to failuresAndrii Nakryiko1-12/+20
Switch to non-blocking accept and wait for server thread to exit before proceeding. I noticed that sometimes tcp_rtt server thread failure would "spill over" into other tests (that would run after tcp_rtt), probably just because server thread exits much later and tcp_rtt doesn't wait for it. v1->v2: - add usleep() while waiting on initial non-blocking accept() (Stanislav); Fixes: 8a03222f508b ("selftests/bpf: test_progs: fix client/server race in tcp_rtt") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20200311222749.458015-1-andriin@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-03-13selftests/bpf: Guarantee that useep() calls nanosleep() syscallAndrii Nakryiko1-0/+9
Some implementations of C runtime library won't call nanosleep() syscall from usleep(). But a bunch of kprobe/tracepoint selftests rely on nanosleep being called to trigger them. To make this more reliable, "override" usleep implementation and call nanosleep explicitly. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Julia Kartseva <hex@fb.com> Link: https://lore.kernel.org/bpf/20200311185345.3874602-1-andriin@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-03-13tools: bpftool: Restore message on failure to guess program typeQuentin Monnet4-11/+35
In commit 4a3d6c6a6e4d ("libbpf: Reduce log level for custom section names"), log level for messages for libbpf_attach_type_by_name() and libbpf_prog_type_by_name() was downgraded from "info" to "debug". The latter function, in particular, is used by bpftool when attempting to load programs, and this change caused bpftool to exit with no hint or error message when it fails to detect the type of the program to load (unless "-d" option was provided). To help users understand why bpftool fails to load the program, let's do a second run of the function with log level in "debug" mode in case of failure. Before: # bpftool prog load sample_ret0.o /sys/fs/bpf/sample_ret0 # echo $? 255 Or really verbose with -d flag: # bpftool -d prog load sample_ret0.o /sys/fs/bpf/sample_ret0 libbpf: loading sample_ret0.o libbpf: section(1) .strtab, size 134, link 0, flags 0, type=3 libbpf: skip section(1) .strtab libbpf: section(2) .text, size 16, link 0, flags 6, type=1 libbpf: found program .text libbpf: section(3) .debug_abbrev, size 55, link 0, flags 0, type=1 libbpf: skip section(3) .debug_abbrev libbpf: section(4) .debug_info, size 75, link 0, flags 0, type=1 libbpf: skip section(4) .debug_info libbpf: section(5) .rel.debug_info, size 32, link 14, flags 0, type=9 libbpf: skip relo .rel.debug_info(5) for section(4) libbpf: section(6) .debug_str, size 150, link 0, flags 30, type=1 libbpf: skip section(6) .debug_str libbpf: section(7) .BTF, size 155, link 0, flags 0, type=1 libbpf: section(8) .BTF.ext, size 80, link 0, flags 0, type=1 libbpf: section(9) .rel.BTF.ext, size 32, link 14, flags 0, type=9 libbpf: skip relo .rel.BTF.ext(9) for section(8) libbpf: section(10) .debug_frame, size 40, link 0, flags 0, type=1 libbpf: skip section(10) .debug_frame libbpf: section(11) .rel.debug_frame, size 16, link 14, flags 0, type=9 libbpf: skip relo .rel.debug_frame(11) for section(10) libbpf: section(12) .debug_line, size 74, link 0, flags 0, type=1 libbpf: skip section(12) .debug_line libbpf: section(13) .rel.debug_line, size 16, link 14, flags 0, type=9 libbpf: skip relo .rel.debug_line(13) for section(12) libbpf: section(14) .symtab, size 96, link 1, flags 0, type=2 libbpf: looking for externs among 4 symbols... libbpf: collected 0 externs total libbpf: failed to guess program type from ELF section '.text' libbpf: supported section(type) names are: socket sk_reuseport kprobe/ [...] After: # bpftool prog load sample_ret0.o /sys/fs/bpf/sample_ret0 libbpf: failed to guess program type from ELF section '.text' libbpf: supported section(type) names are: socket sk_reuseport kprobe/ [...] Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200311021205.9755-1-quentin@isovalent.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-03-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller31-236/+305
Minor overlapping changes, nothing serious. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-13bpf: Add bpf_xdp_output() helperEelco Chaudron3-1/+102
Introduce new helper that reuses existing xdp perf_event output implementation, but can be called from raw_tracepoint programs that receive 'struct xdp_buff *' as a tracepoint argument. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/158348514556.2239.11050972434793741444.stgit@xdp-tutorial
2020-03-13tools/testing/selftests/bpf: Add self-tests for new helper ↵Carlos Neira5-1/+287
bpf_get_ns_current_pid_tgid. Self tests added for new helper bpf_get_ns_current_pid_tgid Signed-off-by: Carlos Neira <cneirabustos@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200304204157.58695-4-cneirabustos@gmail.com
2020-03-13bpf: Added new helper bpf_get_ns_current_pid_tgidCarlos Neira1-1/+19
New bpf helper bpf_get_ns_current_pid_tgid, This helper will return pid and tgid from current task which namespace matches dev_t and inode number provided, this will allows us to instrument a process inside a container. Signed-off-by: Carlos Neira <cneirabustos@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200304204157.58695-3-cneirabustos@gmail.com
2020-03-13tools: bpftool: Fix minor bash completion mistakesQuentin Monnet1-8/+21
Minor fixes for bash completion: addition of program name completion for two subcommands, and correction for program test-runs and map pinning. The completion for the following commands is fixed or improved: # bpftool prog run [TAB] # bpftool prog pin [TAB] # bpftool map pin [TAB] # bpftool net attach xdp name [TAB] Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200312184608.12050-3-quentin@isovalent.com
2020-03-13tools: bpftool: Allow all prog/map handles for pinning objectsQuentin Monnet4-32/+7
Documentation and interactive help for bpftool have always explained that the regular handles for programs (id|name|tag|pinned) and maps (id|name|pinned) can be passed to the utility when attempting to pin objects (bpftool prog pin PROG / bpftool map pin MAP). THIS IS A LIE!! The tool actually accepts only ids, as the parsing is done in do_pin_any() in common.c instead of reusing the parsing functions that have long been generic for program and map handles. Instead of fixing the doc, fix the code. It is trivial to reuse the generic parsing, and to simplify do_pin_any() in the process. Do not accept to pin multiple objects at the same time with prog_parse_fds() or map_parse_fds() (this would require a more complex syntax for passing multiple sysfs paths and validating that they correspond to the number of e.g. programs we find for a given name or tag). Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200312184608.12050-2-quentin@isovalent.com
2020-03-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds2-3/+32
Pull networking fixes from David Miller: "It looks like a decent sized set of fixes, but a lot of these are one liner off-by-one and similar type changes: 1) Fix netlink header pointer to calcular bad attribute offset reported to user. From Pablo Neira Ayuso. 2) Don't double clear PHY interrupts when ->did_interrupt is set, from Heiner Kallweit. 3) Add missing validation of various (devlink, nl802154, fib, etc.) attributes, from Jakub Kicinski. 4) Missing *pos increments in various netfilter seq_next ops, from Vasily Averin. 5) Missing break in of_mdiobus_register() loop, from Dajun Jin. 6) Don't double bump tx_dropped in veth driver, from Jiang Lidong. 7) Work around FMAN erratum A050385, from Madalin Bucur. 8) Make sure ARP header is pulled early enough in bonding driver, from Eric Dumazet. 9) Do a cond_resched() during multicast processing of ipvlan and macvlan, from Mahesh Bandewar. 10) Don't attach cgroups to unrelated sockets when in interrupt context, from Shakeel Butt. 11) Fix tpacket ring state management when encountering unknown GSO types. From Willem de Bruijn. 12) Fix MDIO bus PHY resume by checking mdio_bus_phy_may_suspend() only in the suspend context. From Heiner Kallweit" * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (112 commits) net: systemport: fix index check to avoid an array out of bounds access tc-testing: add ETS scheduler to tdc build configuration net: phy: fix MDIO bus PM PHY resuming net: hns3: clear port base VLAN when unload PF net: hns3: fix RMW issue for VLAN filter switch net: hns3: fix VF VLAN table entries inconsistent issue net: hns3: fix "tc qdisc del" failed issue taprio: Fix sending packets without dequeueing them net: mvmdio: avoid error message for optional IRQ net: dsa: mv88e6xxx: Add missing mask of ATU occupancy register net: memcg: fix lockdep splat in inet_csk_accept() s390/qeth: implement smarter resizing of the RX buffer pool s390/qeth: refactor buffer pool code s390/qeth: use page pointers to manage RX buffer pool seg6: fix SRv6 L2 tunnels to use IANA-assigned protocol number net: dsa: Don't instantiate phylink for CPU/DSA ports unless needed net/packet: tpacket_rcv: do not increment ring index on drop sxgbe: Fix off by one in samsung driver strncpy size arg net: caif: Add lockdep expression to RCU traversal primitive MAINTAINERS: remove Sathya Perla as Emulex NIC maintainer ...