kernel/linux.git - Linux kernel stable tree (mirror)

Age	Commit message (Collapse)	Author	Files	Lines
2018-05-24	ipv6: sr: export function lookup_nexthop	Mathieu Xhonneux	3	-10/+37
	The function lookup_nexthop is essential to implement most of the seg6local actions. As we want to provide a BPF helper allowing to apply some of these actions on the packet being processed, the helper should be able to call this function, hence the need to make it public. Moreover, if one argument is incorrect or if the next hop can not be found, an error should be returned by the BPF helper so the BPF program can adapt its processing of the packet (return an error, properly force the drop, ...). This patch hence makes this function return dst->error to indicate a possible error. Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com> Acked-by: David Lebrun <dlebrun@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-24	ipv6: sr: make seg6.h includable without IPv6	Mathieu Xhonneux	1	-0/+4
	include/net/seg6.h cannot be included in a source file if CONFIG_IPV6 is not enabled: include/net/seg6.h: In function 'seg6_pernet': >> include/net/seg6.h:52:14: error: 'struct net' has no member named 'ipv6'; did you mean 'ipv4'? return net->ipv6.seg6_data; ^~~~ ipv4 This commit makes seg6_pernet return NULL if IPv6 is not compiled, hence allowing seg6.h to be included regardless of the configuration. Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-24	Merge branch 'bpf-multi-prog-improvements'	Daniel Borkmann	8	-35/+301
	Sandipan Das says: ==================== [1] Support for bpf-to-bpf function calls in the powerpc64 JIT compiler. [2] Provide a way for resolving function calls because of the way JITed images are allocated in powerpc64. [3] Fix to get JITed instruction dumps for multi-function programs from the bpf system call. [4] Fix for bpftool to show delimited multi-function JITed image dumps. v4: - Incorporate review comments from Jakub. - Fix JSON output for bpftool. v3: - Change base tree tag to bpf-next. - Incorporate review comments from Alexei, Daniel and Jakub. - Make sure that the JITed image does not grow or shrink after the last pass due to the way the instruction sequence used to load a callee's address maybe optimized. - Make additional changes to the bpf system call and bpftool to make multi-function JITed dumps easier to correlate. v2: - Incorporate review comments from Jakub. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-24	tools: bpftool: add delimiters to multi-function JITed dumps	Sandipan Das	3	-3/+75
	This splits up the contiguous JITed dump obtained via the bpf system call into more relatable chunks for each function in the program. If the kernel symbols corresponding to these are known, they are printed in the header for each JIT image dump otherwise the masked start address is printed. Before applying this patch: # bpftool prog dump jited id 1 0: push %rbp 1: mov %rsp,%rbp ... 70: leaveq 71: retq 72: push %rbp 73: mov %rsp,%rbp ... dd: leaveq de: retq # bpftool -p prog dump jited id 1 [{ "pc": "0x0", "operation": "push", "operands": ["%rbp" ] },{ ... },{ "pc": "0x71", "operation": "retq", "operands": [null ] },{ "pc": "0x72", "operation": "push", "operands": ["%rbp" ] },{ ... },{ "pc": "0xde", "operation": "retq", "operands": [null ] } ] After applying this patch: # echo 0 > /proc/sys/net/core/bpf_jit_kallsyms # bpftool prog dump jited id 1 0xffffffffc02c7000: 0: push %rbp 1: mov %rsp,%rbp ... 70: leaveq 71: retq 0xffffffffc02cf000: 0: push %rbp 1: mov %rsp,%rbp ... 6b: leaveq 6c: retq # bpftool -p prog dump jited id 1 [{ "name": "0xffffffffc02c7000", "insns": [{ "pc": "0x0", "operation": "push", "operands": ["%rbp" ] },{ ... },{ "pc": "0x71", "operation": "retq", "operands": [null ] } ] },{ "name": "0xffffffffc02cf000", "insns": [{ "pc": "0x0", "operation": "push", "operands": ["%rbp" ] },{ ... },{ "pc": "0x6c", "operation": "retq", "operands": [null ] } ] } ] # echo 1 > /proc/sys/net/core/bpf_jit_kallsyms # bpftool prog dump jited id 1 bpf_prog_b811aab41a39ad3d_foo: 0: push %rbp 1: mov %rsp,%rbp ... 70: leaveq 71: retq bpf_prog_cf418ac8b67bebd9_F: 0: push %rbp 1: mov %rsp,%rbp ... 6b: leaveq 6c: retq # bpftool -p prog dump jited id 1 [{ "name": "bpf_prog_b811aab41a39ad3d_foo", "insns": [{ "pc": "0x0", "operation": "push", "operands": ["%rbp" ] },{ ... },{ "pc": "0x71", "operation": "retq", "operands": [null ] } ] },{ "name": "bpf_prog_cf418ac8b67bebd9_F", "insns": [{ "pc": "0x0", "operation": "push", "operands": ["%rbp" ] },{ ... },{ "pc": "0x6c", "operation": "retq", "operands": [null ] } ] } ] Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-24	tools: bpf: sync bpf uapi header	Sandipan Das	1	-0/+2
	Syncing the bpf.h uapi header with tools so that struct bpf_prog_info has the two new fields for passing on the JITed image lengths of each function in a multi-function program. Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-24	bpf: get JITed image lengths of functions via syscall	Sandipan Das	2	-0/+22
	This adds new two new fields to struct bpf_prog_info. For multi-function programs, these fields can be used to pass a list of the JITed image lengths of each function for a given program to userspace using the bpf system call with the BPF_OBJ_GET_INFO_BY_FD command. This can be used by userspace applications like bpftool to split up the contiguous JITed dump, also obtained via the system call, into more relatable chunks corresponding to each function. Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-24	bpf: fix multi-function JITed dump obtained via syscall	Sandipan Das	1	-3/+34
	Currently, for multi-function programs, we cannot get the JITed instructions using the bpf system call's BPF_OBJ_GET_INFO_BY_FD command. Because of this, userspace tools such as bpftool fail to identify a multi-function program as being JITed or not. With the JIT enabled and the test program running, this can be verified as follows: # cat /proc/sys/net/core/bpf_jit_enable 1 Before applying this patch: # bpftool prog list 1: kprobe name foo tag b811aab41a39ad3d gpl loaded_at 2018-05-16T11:43:38+0530 uid 0 xlated 216B not jited memlock 65536B ... # bpftool prog dump jited id 1 no instructions returned After applying this patch: # bpftool prog list 1: kprobe name foo tag b811aab41a39ad3d gpl loaded_at 2018-05-16T12:13:01+0530 uid 0 xlated 216B jited 308B memlock 65536B ... # bpftool prog dump jited id 1 0: nop 4: nop 8: mflr r0 c: std r0,16(r1) 10: stdu r1,-112(r1) 14: std r31,104(r1) 18: addi r31,r1,48 1c: li r3,10 ... Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-24	tools: bpftool: resolve calls without using imm field	Sandipan Das	3	-1/+35
	Currently, we resolve the callee's address for a JITed function call by using the imm field of the call instruction as an offset from __bpf_call_base. If bpf_jit_kallsyms is enabled, we further use this address to get the callee's kernel symbol's name. For some architectures, such as powerpc64, the imm field is not large enough to hold this offset. So, instead of assigning this offset to the imm field, the verifier now assigns the subprog id. Also, a list of kernel symbol addresses for all the JITed functions is provided in the program info. We now use the imm field as an index for this list to lookup a callee's symbol's address and resolve its name. Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-24	tools: bpf: sync bpf uapi header	Sandipan Das	1	-0/+2
	Syncing the bpf.h uapi header with tools so that struct bpf_prog_info has the two new fields for passing on the addresses of the kernel symbols corresponding to each function in a program. Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-24	bpf: get kernel symbol addresses via syscall	Sandipan Das	3	-6/+28
	This adds new two new fields to struct bpf_prog_info. For multi-function programs, these fields can be used to pass a list of kernel symbol addresses for all functions in a given program to userspace using the bpf system call with the BPF_OBJ_GET_INFO_BY_FD command. When bpf_jit_kallsyms is enabled, we can get the address of the corresponding kernel symbol for a callee function and resolve the symbol's name. The address is determined by adding the value of the call instruction's imm field to __bpf_call_base. This offset gets assigned to the imm field by the verifier. For some architectures, such as powerpc64, the imm field is not large enough to hold this offset. We resolve this by: [1] Assigning the subprog id to the imm field of a call instruction in the verifier instead of the offset of the callee's symbol's address from __bpf_call_base. [2] Determining the address of a callee's corresponding symbol by using the imm field as an index for the list of kernel symbol addresses now available from the program info. Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-24	bpf: powerpc64: add JIT support for multi-function programs	Sandipan Das	1	-10/+66
	This adds support for bpf-to-bpf function calls in the powerpc64 JIT compiler. The JIT compiler converts the bpf call instructions to native branch instructions. After a round of the usual passes, the start addresses of the JITed images for the callee functions are known. Finally, to fixup the branch target addresses, we need to perform an extra pass. Because of the address range in which JITed images are allocated on powerpc64, the offsets of the start addresses of these images from __bpf_call_base are as large as 64 bits. So, for a function call, we cannot use the imm field of the instruction to determine the callee's address. Instead, we use the alternative method of getting it from the list of function addresses in the auxiliary data of the caller by using the off field as an index. Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-24	bpf: powerpc64: pad function address loads with NOPs	Sandipan Das	1	-11/+23
	For multi-function programs, loading the address of a callee function to a register requires emitting instructions whose count varies from one to five depending on the nature of the address. Since we come to know of the callee's address only before the extra pass, the number of instructions required to load this address may vary from what was previously generated. This can make the JITed image grow or shrink. To avoid this, we should generate a constant five-instruction when loading function addresses by padding the optimized load sequence with NOPs. Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-24	bpf: support 64-bit offsets for bpf function calls	Sandipan Das	1	-1/+14
	The imm field of a bpf instruction is a signed 32-bit integer. For JITed bpf-to-bpf function calls, it holds the offset of the start address of the callee's JITed image from __bpf_call_base. For some architectures, such as powerpc64, this offset may be as large as 64 bits and cannot be accomodated in the imm field without truncation. We resolve this by: [1] Additionally using the auxiliary data of each function to keep a list of start addresses of the JITed images for all functions determined by the verifier. [2] Retaining the subprog id inside the off field of the call instructions and using it to index into the list mentioned above and lookup the callee's address. To make sure that the existing JIT compilers continue to work without requiring changes, we keep the imm field as it is. Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-24	bpf: btf: Avoid variable length array	Martin KaFai Lau	1	-6/+5
	Sparse warning: kernel/bpf/btf.c:1985:34: warning: Variable length array is used. This patch directly uses ARRAY_SIZE(). Fixes: f80442a4cd18 ("bpf: btf: Change how section is supported in btf_header") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-23	tools/lib/libbpf.c: fix string format to allow build on arm32	Sirio Balmelli	1	-4/+4
	On arm32, 'cd tools/testing/selftests/bpf && make' fails with: libbpf.c:80:10: error: format ‘%ld’ expects argument of type ‘long int’, but argument 4 has type ‘int64_t {aka long long int}’ [-Werror=format=] (func)("libbpf: " fmt, ##__VA_ARGS__); \ ^ libbpf.c:83:30: note: in expansion of macro ‘__pr’ #define pr_warning(fmt, ...) __pr(__pr_warning, fmt, ##__VA_ARGS__) ^~~~ libbpf.c:1072:3: note: in expansion of macro ‘pr_warning’ pr_warning("map:%s value_type:%s has BTF type_size:%ld != value_size:%u\n", To fix, typecast 'key_size' and amend format string. Signed-off-by: Sirio Balmelli <sirio@b-ad.ch> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-23	selftests/bpf: Makefile fix "missing" headers on build with -idirafter	Sirio Balmelli	1	-0/+10
	Selftests fail to build on several distros/architectures because of missing headers files. On a Ubuntu/x86_64 some missing headers are: asm/byteorder.h, asm/socket.h, asm/sockios.h On a Debian/arm32 build already fails at sys/cdefs.h In both cases, these already exist in /usr/include/<arch-specific-dir>, but Clang does not include these when using '-target bpf' flag, since it is no longer compiling against the host architecture. The solution is to: - run Clang without '-target bpf' and extract the include chain for the current system - add these to the bpf build with '-idirafter' The choice of -idirafter is to catch this error without injecting unexpected include behavior: if an arch-specific tree is built for bpf in the future, this will be correctly found by Clang. Signed-off-by: Sirio Balmelli <sirio@b-ad.ch> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-23	Merge branch 'btf-uapi-cleanups'	Daniel Borkmann	14	-284/+753
	Martin KaFai Lau says: ==================== This patch set makes some changes to cleanup the unused bits in BTF uapi. It also makes the btf_header extensible. Please see individual patches for details. v2: - Remove NR_SECS from patch 2 - Remove "unsigned" check on array->index_type from patch 3 - Remove BTF_INT_VARARGS and further limit BTF_INT_ENCODING from 8 bits to 4 bits in patch 4 - Adjustments in test_btf.c to reflect changes in v2 ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-23	bpf: btf: Add tests for the btf uapi changes	Martin KaFai Lau	6	-116/+456
	This patch does the followings: 1. Modify libbpf and test_btf to reflect the uapi changes in btf 2. Add test for the btf_header changes 3. Add tests for array->index_type 4. Add err_str check to the tests 5. Fix a 4 bytes hole in "struct test #1" by swapping "m" and "n" Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-23	bpf: btf: Sync bpf.h and btf.h to tools	Martin KaFai Lau	2	-30/+15
	This patch sync the uapi bpf.h and btf.h to tools. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-23	bpf: btf: Rename btf_key_id and btf_value_id in bpf_map_info	Martin KaFai Lau	4	-16/+16
	In "struct bpf_map_info", the name "btf_id", "btf_key_id" and "btf_value_id" could cause confusion because the "id" of "btf_id" means the BPF obj id given to the BTF object while "btf_key_id" and "btf_value_id" means the BTF type id within that BTF object. To make it clear, btf_key_id and btf_value_id are renamed to btf_key_type_id and btf_value_type_id. Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-23	bpf: btf: Remove unused bits from uapi/linux/btf.h	Martin KaFai Lau	2	-37/+44
	This patch does the followings: 1. Limit BTF_MAX_TYPES and BTF_MAX_NAME_OFFSET to 64k. We can raise it later. 2. Remove the BTF_TYPE_PARENT and BTF_STR_TBL_ELF_ID. They are currently encoded at the highest bit of a u32. It is because the current use case does not require supporting parent type (i.e type_id referring to a type in another BTF file). It also does not support referring to a string in ELF. The BTF_TYPE_PARENT and BTF_STR_TBL_ELF_ID checks are replaced by BTF_TYPE_ID_CHECK and BTF_STR_OFFSET_CHECK which are defined in btf.c instead of uapi/linux/btf.h. 3. Limit the BTF_INFO_KIND from 5 bits to 4 bits which is enough. There is unused bits headroom if we ever needed it later. 4. The root bit in BTF_INFO is also removed because it is not used in the current use case. 5. Remove BTF_INT_VARARGS since func type is not supported now. The BTF_INT_ENCODING is limited to 4 bits instead of 8 bits. The above can be added back later because the verifier ensures the unused bits are zeros. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-23	bpf: btf: Check array->index_type	Martin KaFai Lau	1	-24/+56
	Instead of ingoring the array->index_type field. Enforce that it must be a BTF_KIND_INT in size 1/2/4/8 bytes. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-23	bpf: btf: Change how section is supported in btf_header	Martin KaFai Lau	2	-57/+160
	There are currently unused section descriptions in the btf_header. Those sections are here to support future BTF use cases. For example, the func section (func_off) is to support function signature (e.g. the BPF prog function signature). Instead of spelling out all potential sections up-front in the btf_header. This patch makes changes to btf_header such that extending it (e.g. adding a section) is possible later. The unused ones can be removed for now and they can be added back later. This patch: 1. adds a hdr_len to the btf_header. It will allow adding sections (and other info like parent_label and parent_name) later. The check is similar to the existing bpf_attr. If a user passes in a longer hdr_len, the kernel ensures the extra tailing bytes are 0. 2. allows the section order in the BTF object to be different from its sec_off order in btf_header. 3. each sec_off is followed by a sec_len. It must not have gap or overlapping among sections. The string section is ensured to be at the end due to the 4 bytes alignment requirement of the type section. The above changes will allow enough flexibility to add new sections (and other info) to the btf_header later. This patch also removes an unnecessary !err check at the end of btf_parse(). Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-23	bpf: Expose check_uarg_tail_zero()	Martin KaFai Lau	2	-7/+9
	This patch exposes check_uarg_tail_zero() which will be reused by a later BTF patch. Its name is changed to bpf_check_uarg_tail_zero(). Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-22	Merge branch 'bpf-fib-mtu-check'	Daniel Borkmann	9	-7/+136
	David Ahern says: ==================== Packets that exceed the egress MTU can not be forwarded in the fast path. Add IPv4 and IPv6 MTU helpers that take a FIB lookup result (versus the typical dst path) and add the calls to bpf_ipv{4,6}_fib_lookup. v2 - add ip6_mtu_from_fib6 to ipv6_stub - only call the new MTU helpers for fib lookups in XDP path; skb path uses is_skb_forwardable to determine if the packet can be sent via the egress device from the FIB lookup ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-22	bpf: Add mtu checking to FIB forwarding helper	David Ahern	1	-7/+35
	Add check that egress MTU can handle packet to be forwarded. If the MTU is less than the packet length, return 0 meaning the packet is expected to continue up the stack for help - eg., fragmenting the packet or sending an ICMP. The XDP path needs to leverage the FIB entry for an MTU on the route spec or an exception entry for a given destination. The skb path lets is_skb_forwardable decide if the packet can be sent. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-22	net/ipv6: Add helper to return path MTU based on fib result	David Ahern	6	-0/+68
	Determine path MTU from a FIB lookup result. Logic is based on ip6_dst_mtu_forward plus lookup of nexthop exception. Add ip6_dst_mtu_forward to ipv6_stubs to handle access by core bpf code. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-22	net/ipv4: Add helper to return path MTU based on fib result	David Ahern	2	-0/+33
	Determine path MTU from a FIB lookup result. Logic is a distillation of ip_dst_mtu_maybe_forward. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-22	Merge branch 'bpf-af-xdp-cleanups'	Daniel Borkmann	6	-156/+225
	Björn Töpel says: ==================== This the second follow-up set. The first four patches are uapi changes: * Removing rebind support * Getting rid of structure hole * Removing explicit cache line alignment * Stricter bind checks The last patches do some cleanups, where the umem and refcount_t changes were suggested by Daniel. * Add a missing write-barrier and use READ_ONCE for data-dependencies * Clean up umem and do proper locking * Convert atomic_t to refcount_t ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-22	xsk: convert atomic_t to refcount_t	Björn Töpel	2	-4/+4
	Introduce refcount_t, in favor of atomic_t. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-22	xsk: simplified umem setup	Björn Töpel	3	-55/+51
	As suggested by Daniel Borkmann, the umem setup code was a too defensive and complex. Here, we reduce the number of checks. Also, the memory pinning is now folded into the umem creation, and we do correct locking. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-22	xsk: add missing write- and data-dependency barrier	Björn Töpel	1	-5/+9
	Here, we add a missing write-barrier, and use READ_ONCE for the data-dependency barrier. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-22	samples/bpf: adapt xdpsock to the new uapi	Björn Töpel	1	-47/+76
	Adapt xdpsock to use the new getsockopt introduced in the previous commit. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-22	xsk: remove explicit ring structure from uapi	Björn Töpel	3	-22/+68
	In this commit we remove the explicit ring structure from the the uapi. It is tricky for an uapi to depend on a certain L1 cache line size, since it can differ for variants of the same architecture. Now, we let the user application determine the offsets of the producer, consumer and descriptors by asking the socket via getsockopt. A typical flow would be (Rx ring): struct xdp_mmap_offsets off; struct xdp_desc ring; u32 prod, cons; void map; ... getsockopt(fd, SOL_XDP, XDP_MMAP_OFFSETS, &off, &optlen); map = mmap(NULL, off.rx.desc + NUM_DESCS * sizeof(struct xdp_desc), PROT_READ \| PROT_WRITE, MAP_SHARED \| MAP_POPULATE, sfd, XDP_PGOFF_RX_RING); prod = map + off.rx.producer; cons = map + off.rx.consumer; ring = map + off.rx.desc; Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-22	xsk: proper queue id check at bind	Magnus Karlsson	1	-1/+7
	Validate the queue id against both Rx and Tx on the netdev. Also, make sure that the queue exists at xmit time. Reported-by: Jesper Dangaard Brouer <brouer@redhat.com> Tested-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-22	xsk: fill hole in struct sockaddr_xdp	Björn Töpel	1	-1/+1
	Move the sxdp_flags up, avoiding a hole in the uapi structure. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-22	xsk: remove rebind support	Björn Töpel	1	-21/+9
	Supporting rebind, i.e. after a successful bind the process can call bind again without closing the socket, makes the AF_XDP setup state machine more complex. Constrain the state space, by not supporting rebind. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-19	Merge branch 'bpf-sk-msg-fields'	Daniel Borkmann	6	-3/+244
	John Fastabend says: ==================== In this series we add the ability for sk msg programs to read basic sock information about the sock they are attached to. The second patch adds the tests to the selftest test_verifier. One observation that I had from writing this seriess is lots of the ./net/core/filter.c code is almost duplicated across program types. I thought about building a template/macro that we could use as a single block of code to read sock data out for multiple programs, but I wasn't convinced it was worth it yet. The result was using a macro saved a couple lines of code per block but made the code a bit harder to read IMO. We can probably revisit the idea later if we get more duplication. v2: add errstr field to negative test_verifier test cases to ensure we get the expected err string back from the verifier. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-18	bpf: add sk_msg prog sk access tests to test_verifier	John Fastabend	2	-0/+123
	Add tests for BPF_PROG_TYPE_SK_MSG to test_verifier for read access to new sk fields. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-18	bpf: allow sk_msg programs to read sock fields	John Fastabend	4	-3/+121
	Currently sk_msg programs only have access to the raw data. However, it is often useful when building policies to have the policies specific to the socket endpoint. This allows using the socket tuple as input into filters, etc. This patch adds ctx access to the sock fields. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-18	Merge branch 'bpf-nfp-shift-insns'	Daniel Borkmann	5	-31/+435
	Jiong Wang says: ==================== NFP eBPF JIT is missing logic indirect shifts (both left and right) and arithmetic right shift (both indirect shift and shift by constant). This patch adds support for them. For indirect shifts, shift amount is not specified as constant, NFP needs to get the shift amount through the low 5 bits of source A operand in PREV_ALU, therefore extra instructions are needed compared with shifts by constants. Because NFP is 32-bit, so we are using register pair for 64-bit shifts and therefore would need different instruction sequences depending on whether shift amount is less than 32 or not. NFP branch-on-bit-test instruction emitter is added by this patch set and is used for efficient runtime check on shift amount. We'd think the shift amount is less than 32 if bit 5 is clear and greater or equal then 32 otherwise. Shift amount is greater than or equal to 64 will result in undefined behavior. This patch also use range info to avoid generating unnecessary runtime code if we are certain shift amount is less than 32 or not. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-18	nfp: bpf: support arithmetic indirect right shift (BPF_ARSH \| BPF_X)	Jiong Wang	1	-10/+89
	Code logic is similar with arithmetic right shift by constant, and NFP get indirect shift amount through source A operand of PREV_ALU. It is possible to fall back to logic right shift if the MSB is known to be zero from range info, however there is no benefit to do this given logic indirect right shift use the same number and cycle of instruction sequence. Suppose the MSB of regX is the bit we want to replicate to fill in all the vacant positions, and regY contains the shift amount, then we could use single instruction to set up both. [alu, --, regY, OR, regX] -- NOTE: the PREV_ALU result doesn't need to write to any destination register. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-18	nfp: bpf: support arithmetic right shift by constant (BPF_ARSH \| BPF_K)	Jiong Wang	2	-0/+35
	Code logic is similar with logic right shift except we also need to set PREV_ALU result properly, the MSB of which is the bit that will be replicated to fill in all the vacant positions. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-18	nfp: bpf: support logic indirect shifts (BPF_[L\|R]SH \| BPF_X)	Jiong Wang	5	-32/+322
	For indirect shifts, shift amount is not specified as constant, NFP needs to get the shift amount through the low 5 bits of source A operand in PREV_ALU, therefore extra instructions are needed compared with shifts by constants. Because NFP is 32-bit, so we are using register pair for 64-bit shifts and therefore would need different instruction sequences depending on whether shift amount is less than 32 or not. NFP branch-on-bit-test instruction emitter is added by this patch and is used for efficient runtime check on shift amount. We'd think the shift amount is less than 32 if bit 5 is clear and greater or equal than 32 otherwise. Shift amount is greater than or equal to 64 will result in undefined behavior. This patch also use range info to avoid generating unnecessary runtime code if we are certain shift amount is less than 32 or not. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-18	Merge branch 'bpf-af-xdp-cleanups'	Daniel Borkmann	11	-127/+34
	Björn Töpel says: ==================== This series contain "cosmetics only" follow-up patches for AF_XDP. Thanks to Daniel for suggesting them! ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-18	xsk: proper '=' alignment	Björn Töpel	1	-18/+18
	Properly align xsk_proto_ops initialization. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-18	xsk: fixed some cases of unnecessary parentheses	Björn Töpel	3	-6/+5
	Removed some cases of unnecessary parentheses. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-18	xsk: remove newline at end of file	Björn Töpel	1	-1/+0
	Minor cleanup, remove newline at end of Makefile. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-18	xsk: clean up SPDX headers	Björn Töpel	10	-102/+11
	Clean up SPDX-License-Identifier and removing licensing leftovers. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-17	bpf: sockmap, fix double-free	Gustavo A. R. Silva	1	-1/+0
	`e' is being freed twice. Fix this by removing one of the kfree() calls. Addresses-Coverity-ID: 1468983 ("Double free") Fixes: 81110384441a ("bpf: sockmap, add hash map support") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>