summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2018-06-03bpf: fix context access in tracing progs on 32 bit archsDaniel Borkmann3-9/+34
Wang reported that all the testcases for BPF_PROG_TYPE_PERF_EVENT program type in test_verifier report the following errors on x86_32: 172/p unpriv: spill/fill of different pointers ldx FAIL Unexpected error message! 0: (bf) r6 = r10 1: (07) r6 += -8 2: (15) if r1 == 0x0 goto pc+3 R1=ctx(id=0,off=0,imm=0) R6=fp-8,call_-1 R10=fp0,call_-1 3: (bf) r2 = r10 4: (07) r2 += -76 5: (7b) *(u64 *)(r6 +0) = r2 6: (55) if r1 != 0x0 goto pc+1 R1=ctx(id=0,off=0,imm=0) R2=fp-76,call_-1 R6=fp-8,call_-1 R10=fp0,call_-1 fp-8=fp 7: (7b) *(u64 *)(r6 +0) = r1 8: (79) r1 = *(u64 *)(r6 +0) 9: (79) r1 = *(u64 *)(r1 +68) invalid bpf_context access off=68 size=8 378/p check bpf_perf_event_data->sample_period byte load permitted FAIL Failed to load prog 'Permission denied'! 0: (b7) r0 = 0 1: (71) r0 = *(u8 *)(r1 +68) invalid bpf_context access off=68 size=1 379/p check bpf_perf_event_data->sample_period half load permitted FAIL Failed to load prog 'Permission denied'! 0: (b7) r0 = 0 1: (69) r0 = *(u16 *)(r1 +68) invalid bpf_context access off=68 size=2 380/p check bpf_perf_event_data->sample_period word load permitted FAIL Failed to load prog 'Permission denied'! 0: (b7) r0 = 0 1: (61) r0 = *(u32 *)(r1 +68) invalid bpf_context access off=68 size=4 381/p check bpf_perf_event_data->sample_period dword load permitted FAIL Failed to load prog 'Permission denied'! 0: (b7) r0 = 0 1: (79) r0 = *(u64 *)(r1 +68) invalid bpf_context access off=68 size=8 Reason is that struct pt_regs on x86_32 doesn't fully align to 8 byte boundary due to its size of 68 bytes. Therefore, bpf_ctx_narrow_access_ok() will then bail out saying that off & (size_default - 1) which is 68 & 7 doesn't cleanly align in the case of sample_period access from struct bpf_perf_event_data, hence verifier wrongly thinks we might be doing an unaligned access here though underlying arch can handle it just fine. Therefore adjust this down to machine size and check and rewrite the offset for narrow access on that basis. We also need to fix corresponding pe_prog_is_valid_access(), since we hit the check for off % size != 0 (e.g. 68 % 8 -> 4) in the first and last test. With that in place, progs for tracing work on x86_32. Reported-by: Wang YanQing <udknight@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Wang YanQing <udknight@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-06-03bpf: fix cbpf parser bug for octal numbersDaniel Borkmann1-1/+1
Range is 0-7, not 0-9, otherwise parser silently excludes it from the strtol() rather than throwing an error. Reported-by: Marc Boschma <marc@boschma.cx> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-06-03bpf: make sure to clear unused fields in tunnel/xfrm state fetchDaniel Borkmann2-1/+8
Since the remaining bits are not filled in struct bpf_tunnel_key resp. struct bpf_xfrm_state and originate from uninitialized stack space, we should make sure to clear them before handing control back to the program. Also add a padding element to struct bpf_xfrm_state for future use similar as we have in struct bpf_tunnel_key and clear it as well. struct bpf_xfrm_state { __u32 reqid; /* 0 4 */ __u32 spi; /* 4 4 */ __u16 family; /* 8 2 */ /* XXX 2 bytes hole, try to pack */ union { __u32 remote_ipv4; /* 4 */ __u32 remote_ipv6[4]; /* 16 */ }; /* 12 16 */ /* size: 28, cachelines: 1, members: 4 */ /* sum members: 26, holes: 1, sum holes: 2 */ /* last cacheline: 28 bytes */ }; Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-06-03bpf: add bpf_skb_cgroup_id helperDaniel Borkmann2-3/+45
Add a new bpf_skb_cgroup_id() helper that allows to retrieve the cgroup id from the skb's socket. This is useful in particular to enable bpf_get_cgroup_classid()-like behavior for cgroup v1 in cgroup v2 by allowing ID based matching on egress. This can in particular be used in combination with applying policy e.g. from map lookups, and also complements the older bpf_skb_under_cgroup() interface. In user space the cgroup id for a given path can be retrieved through the f_handle as demonstrated in [0] recently. [0] https://lkml.org/lkml/2018/5/22/1190 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-06-03bpf: avoid retpoline for lookup/update/delete calls on mapsDaniel Borkmann3-22/+61
While some of the BPF map lookup helpers provide a ->map_gen_lookup() callback for inlining the map lookup altogether it is not available for every map, so the remaining ones have to call bpf_map_lookup_elem() helper which does a dispatch to map->ops->map_lookup_elem(). In times of retpolines, this will control and trap speculative execution rather than letting it do its work for the indirect call and will therefore cause a slowdown. Likewise, bpf_map_update_elem() and bpf_map_delete_elem() do not have an inlined version and need to call into their map->ops->map_update_elem() resp. map->ops->map_delete_elem() handlers. Before: # bpftool prog dump xlated id 1 0: (bf) r2 = r10 1: (07) r2 += -8 2: (7a) *(u64 *)(r2 +0) = 0 3: (18) r1 = map[id:1] 5: (85) call __htab_map_lookup_elem#232656 6: (15) if r0 == 0x0 goto pc+4 7: (71) r1 = *(u8 *)(r0 +35) 8: (55) if r1 != 0x0 goto pc+1 9: (72) *(u8 *)(r0 +35) = 1 10: (07) r0 += 56 11: (15) if r0 == 0x0 goto pc+4 12: (bf) r2 = r0 13: (18) r1 = map[id:1] 15: (85) call bpf_map_delete_elem#215008 <-- indirect call via 16: (95) exit helper After: # bpftool prog dump xlated id 1 0: (bf) r2 = r10 1: (07) r2 += -8 2: (7a) *(u64 *)(r2 +0) = 0 3: (18) r1 = map[id:1] 5: (85) call __htab_map_lookup_elem#233328 6: (15) if r0 == 0x0 goto pc+4 7: (71) r1 = *(u8 *)(r0 +35) 8: (55) if r1 != 0x0 goto pc+1 9: (72) *(u8 *)(r0 +35) = 1 10: (07) r0 += 56 11: (15) if r0 == 0x0 goto pc+4 12: (bf) r2 = r0 13: (18) r1 = map[id:1] 15: (85) call htab_lru_map_delete_elem#238240 <-- direct call 16: (95) exit In all three lookup/update/delete cases however we can use the actual address of the map callback directly if we find that there's only a single path with a map pointer leading to the helper call, meaning when the map pointer has not been poisoned from verifier side. Example code can be seen above for the delete case. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-06-03bpf: show prog and map id in fdinfoDaniel Borkmann1-4/+8
Its trivial and straight forward to expose it for scripts that can then use it along with bpftool in order to inspect an individual application's used maps and progs. Right now we dump some basic information in the fdinfo file but with the help of the map/prog id full introspection becomes possible now. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-06-03bpf: fixup error message from gpl helpers on license mismatchDaniel Borkmann1-1/+1
Stating 'proprietary program' in the error is just silly since it can also be a different open source license than that which is just not compatible. Reference: https://twitter.com/majek04/status/998531268039102465 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-06-03bpf: add also cbpf long jump test cases with heavy expansionDaniel Borkmann1-0/+63
We have one triggering on eBPF but lets also add a cBPF example to make sure we keep tracking them. Also add anther cBPF test running max number of MSH ops. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-06-03bpf: test case for map pointer poison with calls/branchesDaniel Borkmann3-27/+178
Add several test cases where the same or different map pointers originate from different paths in the program and execute a map lookup or tail call at a common location. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-06-02Merge branch 'btf-fixes'Alexei Starovoitov2-1/+70
Martin KaFai Lau says: ==================== ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-06-02bpf: btf: Ensure t->type == 0 for BTF_KIND_FWDMartin KaFai Lau2-1/+42
The t->type in BTF_KIND_FWD is not used. It must be 0. This patch ensures that and also adds a test case in test_btf.c Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-06-02bpf: btf: Check array t->sizeMartin KaFai Lau2-0/+28
This patch ensures array's t->size is 0. The array size is decided by its individual elem's size and the number of elements. Hence, t->size is not used and it must be 0. A test case is added to test_btf.c Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-06-02Merge branch 'bpf-sockmap-test-fixes'Daniel Borkmann1-20/+67
Prashant Bhole says: ==================== test_sockmap was originally written only to exercise kernel code paths, so there was no strict checking of errors. When the code was modified to run as selftests, due to lack of error handling it was not able to detect test failures. In order to improve, this series fixes error handling, test run time and data verification. Also slightly improved test output by printing parameter values (cork, apply, start, end) so that parameters for all tests are displayed. Changes in v4: - patch1: Ignore RX timoute error only for corked tests - patch3: Setting different timeout for corked tests and reduce run time by reducing number of iterations in some tests Changes in v3: - Skipped error checking for corked tests ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-06-02selftests/bpf: test_sockmap, print additional test optionsPrashant Bhole1-9/+19
Print values of test options like apply, cork, start, end so that individual failed tests can be identified for manual run Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-06-02selftests/bpf: test_sockmap, fix data verificationPrashant Bhole1-1/+13
When data verification is enabled, some tests fail because verification is done incorrectly. Following changes fix it. - Identify the size of data block to be verified - Reset verification counter when data block size is reached - Fixed the value printed in case of verfication failure Fixes: 16962b2404ac ("bpf: sockmap, add selftests") Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-06-02selftests/bpf: test_sockmap, timing improvementsPrashant Bhole1-4/+9
Currently 10us delay is too low for many tests to succeed. It needs to be increased. Also, many corked tests are expected to hit rx timeout irrespective of timeout value. - This patch sets 1000usec timeout value for corked tests because less than that causes broken-pipe error in tx thread. Also sets 1 second timeout for all other tests because less than that results in RX timeout - tests with apply=1 and higher number of iterations were taking lot of time. This patch reduces test run time by reducing iterations. real 0m12.968s user 0m0.219s sys 0m14.337s Fixes: a18fda1a62c3 ("bpf: reduce runtime of test_sockmap tests") Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-06-02selftests/bpf: test_sockmap, join cgroup in selftest modePrashant Bhole1-0/+5
In case of selftest mode, temporary cgroup environment is created but cgroup is not joined. It causes test failures. Fixed by joining the cgroup Fixes: 16962b2404ac ("bpf: sockmap, add selftests") Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-06-02selftests/bpf: test_sockmap, check test failurePrashant Bhole1-6/+21
Test failures are not identified because exit code of RX/TX threads is not checked. Also threads are not returning correct exit code. - Return exit code from threads depending on test execution status - In main thread, check the exit code of RX/TX threads - Skip error checking for corked tests as they are expected to timeout Fixes: 16962b2404ac ("bpf: sockmap, add selftests") Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-31bpf: Change bpf_fib_lookup to return -EAFNOSUPPORT for unsupported address ↵David Ahern1-2/+2
families Update bpf_fib_lookup to return -EAFNOSUPPORT for unsupported address families. Allows userspace to probe for support as more are added (e.g., AF_MPLS). Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-31bpf: devmap: remove redundant assignment of dev = devColin Ian King1-1/+1
The assignment dev = dev is redundant and should be removed. Detected by CoverityScan, CID#1469486 ("Evaluation order violation") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-30bpftool: Support sendmsg{4,6} attach typesAndrey Ignatov3-5/+13
Add support for recently added BPF_CGROUP_UDP4_SENDMSG and BPF_CGROUP_UDP6_SENDMSG attach types to bpftool, update documentation and bash completion. Signed-off-by: Andrey Ignatov <rdna@fb.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-30Merge branch 'bpf-ir-decoder'Daniel Borkmann22-22/+984
Sean Young says: ==================== The kernel IR decoders (drivers/media/rc/ir-*-decoder.c) support the most widely used IR protocols, but there are many protocols which are not supported[1]. For example, the lirc-remotes[2] repo has over 2700 remotes, many of which are not supported by rc-core. There is a "long tail" of unsupported IR protocols, for which lircd is need to decode the IR . IR encoding is done in such a way that some simple circuit can decode it; therefore, bpf is ideal. In order to support all these protocols, here we have bpf based IR decoding. The idea is that user-space can define a decoder in bpf, attach it to the rc device through the lirc chardev. Separate work is underway to extend ir-keytable to have an extensive library of bpf-based decoders, and a much expanded library of rc keymaps. Another future application would be to compile IRP[3] to a IR BPF program, and so support virtually every remote without having to write a decoder for each. It might also be possible to support non-button devices such as analog directional pads or air conditioning remote controls and decode the target temperature in bpf, and pass that to an input device. [1] http://www.hifi-remote.com/wiki/index.php?title=DecodeIR [2] https://sourceforge.net/p/lirc-remotes/code/ci/master/tree/remotes/ [3] http://www.hifi-remote.com/wiki/index.php?title=IRP_Notation Changes since v4: - Renamed rc_dev_bpf_{attach,detach,query} to lirc_bpf_{attach,detach,query} - Fixed error path in lirc_bpf_query - Rebased on bpf-next Changes since v3: - Implemented review comments from Quentin Monnet and Y Song (thanks!) - More helpful and better formatted bpf helper documentation - Changed back to bpf_prog_array rather than open-coded implementation - scancodes can be 64 bit - bpf gets passed values in microseconds, not nanoseconds. microseconds is more than than enough (IR receivers support carriers upto 70kHz, at which point a single period is already 14 microseconds). Also, this makes it much more consistent with lirc mode2. - Since it looks much more like lirc mode2, rename the program type to BPF_PROG_TYPE_LIRC_MODE2. - Rebased on bpf-next Changes since v2: - Fixed locking issues - Improved self-test to cover more cases - Rebased on bpf-next again Changes since v1: - Code review comments from Y Song <ys114321@gmail.com> and Randy Dunlap <rdunlap@infradead.org> - Re-wrote sample bpf to be selftest - Renamed RAWIR_DECODER -> RAWIR_EVENT (Kconfig, context, bpf prog type) - Rebase on bpf-next - Introduced bpf_rawir_event context structure with simpler access checking ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-30bpf: add selftest for lirc_mode2 type programSean Young10-17/+494
This is simple test over rc-loopback. Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Sean Young <sean@mess.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-30media: rc: introduce BPF_PROG_LIRC_MODE2Sean Young10-3/+479
Add support for BPF_PROG_LIRC_MODE2. This type of BPF program can call rc_keydown() to reported decoded IR scancodes, or rc_repeat() to report that the last key should be repeated. The bpf program can be attached to using the bpf(BPF_PROG_ATTACH) syscall; the target_fd must be the /dev/lircN device. Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Sean Young <sean@mess.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-30bpf: bpf_prog_array_copy() should return -ENOENT if exclude_prog not foundSean Young2-2/+11
This makes is it possible for bpf prog detach to return -ENOENT. Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Sean Young <sean@mess.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-29bpf: Verify flags in bpf_fib_lookupDavid Ahern1-0/+6
Verify flags argument contains only known flags. Allows programs to probe for support as more are added. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-29bpf: Drop mpls from bpf_fib_lookupDavid Ahern1-13/+13
MPLS support will not be submitted this dev cycle, but in working on it I do see a few changes are needed to the API. For now, drop mpls from the API. Since the fields in question are unions, the mpls fields can be added back later without affecting the uapi. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-29bpf: hide the unused 'off' variableYueHaibing1-0/+2
The local variable is only used while CONFIG_IPV6 enabled net/core/filter.c: In function ‘sk_msg_convert_ctx_access’: net/core/filter.c:6489:6: warning: unused variable ‘off’ [-Wunused-variable] int off; ^ This puts it into #ifdef. Fixes: 303def35f64e ("bpf: allow sk_msg programs to read sock fields") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-29bpf: clean up eBPF helpers documentationQuentin Monnet2-22/+20
These are minor edits for the eBPF helpers documentation in include/uapi/linux/bpf.h. The main fix consists in removing "BPF_FIB_LOOKUP_", because it ends with a non-escaped underscore that gets interpreted by rst2man and produces the following message in the resulting manual page: DOCUTILS SYSTEM MESSAGES System Message: ERROR/3 (/tmp/bpf-helpers.rst:, line 1514) Unknown target name: "bpf_fib_lookup". Other edits consist in: - Improving formatting for flag values for "bpf_fib_lookup()" helper. - Emphasising a parameter name in description of the return value for "bpf_get_stack()" helper. - Removing unnecessary blank lines between "Description" and "Return" sections for the few helpers that would use it, for consistency. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-28Merge branch 'bpf-sendmsg-hook'Daniel Borkmann14-196/+1215
Andrey Ignatov says: ==================== v3 -> v4: * handle static key correctly for CONFIG_CGROUP_BPF=n. v2 -> v3: * place BPF logic under static key in udp_sendmsg, udpv6_sendmsg; * rebase. v1 -> v2: * return ENOTSUPP if bpf_prog rewrote IPv6-only with IPv4-mapped IPv6; * add test for IPv4-mapped IPv6 use-case; * fix build for CONFIG_CGROUP_BPF=n; * rebase. This path set adds BPF hooks for sys_sendmsg similar to existing hooks for sys_bind and sys_connect. Hooks allow to override source IP (including the case when it's set via cmsg(3)) and destination IP:port for unconnected UDP (slow path). TCP and connected UDP (fast path) are not affected. This makes UDP support complete: connected UDP is handled by sys_connect hooks, unconnected by sys_sendmsg ones. Similar to sys_connect hooks, sys_sendmsg ones can be used to make system calls such as sendmsg(2) and sendto(2) return EPERM. Please see patch 0002 for more details. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-28selftests/bpf: Selftest for sys_sendmsg hooksAndrey Ignatov4-1/+628
Add selftest for BPF_CGROUP_UDP4_SENDMSG and BPF_CGROUP_UDP6_SENDMSG attach types. Try to sendmsg(2) to specific IP:port and test that: * source IP is overridden as expected. * remote IP:port pair is overridden as expected; Both UDPv4 and UDPv6 are tested. Output: # test_sock_addr.sh 2>/dev/null Wait for testing IPv4/IPv6 to become available ... OK ... pre-existing test-cases skipped ... Test case: sendmsg4: load prog with wrong expected attach type .. [PASS] Test case: sendmsg4: attach prog with wrong attach type .. [PASS] Test case: sendmsg4: rewrite IP & port (asm) .. [PASS] Test case: sendmsg4: rewrite IP & port (C) .. [PASS] Test case: sendmsg4: deny call .. [PASS] Test case: sendmsg6: load prog with wrong expected attach type .. [PASS] Test case: sendmsg6: attach prog with wrong attach type .. [PASS] Test case: sendmsg6: rewrite IP & port (asm) .. [PASS] Test case: sendmsg6: rewrite IP & port (C) .. [PASS] Test case: sendmsg6: IPv4-mapped IPv6 .. [PASS] Test case: sendmsg6: deny call .. [PASS] Summary: 27 PASSED, 0 FAILED Signed-off-by: Andrey Ignatov <rdna@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-28selftests/bpf: Prepare test_sock_addr for extensionAndrey Ignatov1-195/+460
test_sock_addr was not easy to extend since it was focused on sys_bind and sys_connect quite a bit. Reorganized it so that it'll be easier to cover new test-cases for `BPF_PROG_TYPE_CGROUP_SOCK_ADDR`: - decouple test-cases so that only one BPF prog is tested at a time; - check programmatically that local IP:port for sys_bind, source IP and destination IP:port for sys_connect are rewritten property by tested BPF programs. The output of new version: # test_sock_addr.sh 2>/dev/null Wait for testing IPv4/IPv6 to become available ... OK Test case: bind4: load prog with wrong expected attach type .. [PASS] Test case: bind4: attach prog with wrong attach type .. [PASS] Test case: bind4: rewrite IP & TCP port in .. [PASS] Test case: bind4: rewrite IP & UDP port in .. [PASS] Test case: bind6: load prog with wrong expected attach type .. [PASS] Test case: bind6: attach prog with wrong attach type .. [PASS] Test case: bind6: rewrite IP & TCP port in .. [PASS] Test case: bind6: rewrite IP & UDP port in .. [PASS] Test case: connect4: load prog with wrong expected attach type .. [PASS] Test case: connect4: attach prog with wrong attach type .. [PASS] Test case: connect4: rewrite IP & TCP port .. [PASS] Test case: connect4: rewrite IP & UDP port .. [PASS] Test case: connect6: load prog with wrong expected attach type .. [PASS] Test case: connect6: attach prog with wrong attach type .. [PASS] Test case: connect6: rewrite IP & TCP port .. [PASS] Test case: connect6: rewrite IP & UDP port .. [PASS] Summary: 16 PASSED, 0 FAILED (stderr contains errors from libbpf when testing load/attach with invalid arguments) Signed-off-by: Andrey Ignatov <rdna@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-28libbpf: Support guessing sendmsg{4,6} progsAndrey Ignatov1-0/+2
libbpf can guess prog type and expected attach type based on section name. Add hints for "cgroup/sendmsg4" and "cgroup/sendmsg6" section names. Signed-off-by: Andrey Ignatov <rdna@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-28bpf: Sync bpf.h to tools/Andrey Ignatov1-0/+8
Sync new `BPF_CGROUP_UDP4_SENDMSG` and `BPF_CGROUP_UDP6_SENDMSG` attach types to tools/. Signed-off-by: Andrey Ignatov <rdna@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-28bpf: Hooks for sys_sendmsgAndrey Ignatov8-9/+125
In addition to already existing BPF hooks for sys_bind and sys_connect, the patch provides new hooks for sys_sendmsg. It leverages existing BPF program type `BPF_PROG_TYPE_CGROUP_SOCK_ADDR` that provides access to socket itlself (properties like family, type, protocol) and user-passed `struct sockaddr *` so that BPF program can override destination IP and port for system calls such as sendto(2) or sendmsg(2) and/or assign source IP to the socket. The hooks are implemented as two new attach types: `BPF_CGROUP_UDP4_SENDMSG` and `BPF_CGROUP_UDP6_SENDMSG` for UDPv4 and UDPv6 correspondingly. UDPv4 and UDPv6 separate attach types for same reason as sys_bind and sys_connect hooks, i.e. to prevent reading from / writing to e.g. user_ip6 fields when user passes sockaddr_in since it'd be out-of-bound. The difference with already existing hooks is sys_sendmsg are implemented only for unconnected UDP. For TCP it doesn't make sense to change user-provided `struct sockaddr *` at sendto(2)/sendmsg(2) time since socket either was already connected and has source/destination set or wasn't connected and call to sendto(2)/sendmsg(2) would lead to ENOTCONN anyway. Connected UDP is already handled by sys_connect hooks that can override source/destination at connect time and use fast-path later, i.e. these hooks don't affect UDP fast-path. Rewriting source IP is implemented differently than that in sys_connect hooks. When sys_sendmsg is used with unconnected UDP it doesn't work to just bind socket to desired local IP address since source IP can be set on per-packet basis by using ancillary data (cmsg(3)). So no matter if socket is bound or not, source IP has to be rewritten on every call to sys_sendmsg. To do so two new fields are added to UAPI `struct bpf_sock_addr`; * `msg_src_ip4` to set source IPv4 for UDPv4; * `msg_src_ip6` to set source IPv6 for UDPv6. Signed-off-by: Andrey Ignatov <rdna@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-28bpf: Define cgroup_bpf_enabled for CONFIG_CGROUP_BPF=nAndrey Ignatov1-0/+1
Static key is used to enable/disable cgroup-bpf related code paths at run time. Though it's not defined when cgroup-bpf is disabled at compile time, i.e. CONFIG_CGROUP_BPF=n, and if some code wants to use it, it has to do this: #ifdef CONFIG_CGROUP_BPF if (cgroup_bpf_enabled) { /* ... some work ... */ } #endif This code can be simplified by setting cgroup_bpf_enabled to 0 for CONFIG_CGROUP_BPF=n case: if (cgroup_bpf_enabled) { /* ... some work ... */ } And it aligns well with existing BPF_CGROUP_RUN_PROG_* macros that defined for both states of CONFIG_CGROUP_BPF. Signed-off-by: Andrey Ignatov <rdna@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-28selftests/bpf: missing headers test_lwt_seg6localMathieu Xhonneux2-0/+135
Previous patch "selftests/bpf: test for seg6local End.BPF action" lacks some UAPI headers in tools/. clang -I. -I./include/uapi -I../../../include/uapi -idirafter /usr/local/include -idirafter /data/users/yhs/work/llvm/build/install/lib/clang/7.0.0/include -idirafter /usr/include -Wno-compare-distinct-pointer-types \ -O2 -target bpf -emit-llvm -c test_lwt_seg6local.c -o - | \ llc -march=bpf -mcpu=generic -filetype=obj -o [...]/net-next/tools/testing/selftests/bpf/test_lwt_seg6local.o test_lwt_seg6local.c:4:10: fatal error: 'linux/seg6_local.h' file not found ^~~~~~~~~~~~~~~~~~~~ 1 error generated. make: Leaving directory `/data/users/yhs/work/net-next/tools/testing/selftests/bpf' v2: moving the headers to tools/include/uapi/. Reported-by: Y Song <ys114321@gmail.com> Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-28bpf: avoid -Wmaybe-uninitialized warningArnd Bergmann1-4/+3
The stack_map_get_build_id_offset() function is too long for gcc to track whether 'work' may or may not be initialized at the end of it, leading to a false-positive warning: kernel/bpf/stackmap.c: In function 'stack_map_get_build_id_offset': kernel/bpf/stackmap.c:334:13: error: 'work' may be used uninitialized in this function [-Werror=maybe-uninitialized] This removes the 'in_nmi_ctx' flag and uses the state of that variable itself to see if it got initialized. Fixes: bae77c5eb5b2 ("bpf: enable stackmap with build_id in nmi context") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-28bpf: btf: avoid -Wreturn-type warningArnd Bergmann1-1/+1
gcc warns about a noreturn function possibly returning in some configurations: kernel/bpf/btf.c: In function 'env_type_is_resolve_sink': kernel/bpf/btf.c:729:1: error: control reaches end of non-void function [-Werror=return-type] Using BUG() instead of BUG_ON() avoids that warning and otherwise does the exact same thing. Fixes: eb3f595dab40 ("bpf: btf: Validate type reference") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-28libbpf: Install btf.h with libbpfAndrey Ignatov1-0/+1
install_headers target should contain all headers that are part of libbpf. Add missing btf.h Signed-off-by: Andrey Ignatov <rdna@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller200-874/+2625
Lots of easy overlapping changes in the confict resolutions here. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-26Merge branch 'akpm' (patches from Andrew)Linus Torvalds16-43/+125
Merge misc fixes from Andrew Morton: "16 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: kasan: fix memory hotplug during boot kasan: free allocated shadow memory on MEM_CANCEL_ONLINE checkpatch: fix macro argument precedence test init/main.c: include <linux/mem_encrypt.h> kernel/sys.c: fix potential Spectre v1 issue mm/memory_hotplug: fix leftover use of struct page during hotplug proc: fix smaps and meminfo alignment mm: do not warn on offline nodes unless the specific node is explicitly requested mm, memory_hotplug: make has_unmovable_pages more robust mm/kasan: don't vfree() nonexistent vm_area MAINTAINERS: change hugetlbfs maintainer and update files ipc/shm: fix shmat() nil address after round-down when remapping Revert "ipc/shm: Fix shmat mmap nil-page protection" idr: fix invalid ptr dereference on item delete ocfs2: revert "ocfs2/o2hb: check len for bio_add_page() to avoid getting incorrect bio" mm: fix nr_rotate_swap leak in swapon() error case
2018-05-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds49-193/+372
Pull networking fixes from David Miller: "Let's begin the holiday weekend with some networking fixes: 1) Whoops need to restrict cfg80211 wiphy names even more to 64 bytes. From Eric Biggers. 2) Fix flags being ignored when using kernel_connect() with SCTP, from Xin Long. 3) Use after free in DCCP, from Alexey Kodanev. 4) Need to check rhltable_init() return value in ipmr code, from Eric Dumazet. 5) XDP handling fixes in virtio_net from Jason Wang. 6) Missing RTA_TABLE in rtm_ipv4_policy[], from Roopa Prabhu. 7) Need to use IRQ disabling spinlocks in mlx4_qp_lookup(), from Jack Morgenstein. 8) Prevent out-of-bounds speculation using indexes in BPF, from Daniel Borkmann. 9) Fix regression added by AF_PACKET link layer cure, from Willem de Bruijn. 10) Correct ENIC dma mask, from Govindarajulu Varadarajan. 11) Missing config options for PMTU tests, from Stefano Brivio" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (48 commits) ibmvnic: Fix partial success login retries selftests/net: Add missing config options for PMTU tests mlx4_core: allocate ICM memory in page size chunks enic: set DMA mask to 47 bit ppp: remove the PPPIOCDETACH ioctl ipv4: remove warning in ip_recv_error net : sched: cls_api: deal with egdev path only if needed vhost: synchronize IOTLB message with dev cleanup packet: fix reserve calculation net/mlx5: IPSec, Fix a race between concurrent sandbox QP commands net/mlx5e: When RXFCS is set, add FCS data into checksum calculation bpf: properly enforce index mask to prevent out-of-bounds speculation net/mlx4: Fix irq-unsafe spinlock usage net: phy: broadcom: Fix bcm_write_exp() net: phy: broadcom: Fix auxiliary control register reads net: ipv4: add missing RTA_TABLE to rtm_ipv4_policy net/mlx4: fix spelling mistake: "Inrerface" -> "Interface" and rephrase message ibmvnic: Only do H_EOI for mobility events tuntap: correctly set SOCKWQ_ASYNC_NOSPACE virtio-net: fix leaking page for gso packet during mergeable XDP ...
2018-05-26kasan: fix memory hotplug during bootDavid Hildenbrand1-1/+1
Using module_init() is wrong. E.g. ACPI adds and onlines memory before our memory notifier gets registered. This makes sure that ACPI memory detected during boot up will not result in a kernel crash. Easily reproducible with QEMU, just specify a DIMM when starting up. Link: http://lkml.kernel.org/r/20180522100756.18478-3-david@redhat.com Fixes: 786a8959912e ("kasan: disable memory hotplug") Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-26kasan: free allocated shadow memory on MEM_CANCEL_ONLINEDavid Hildenbrand1-0/+1
We have to free memory again when we cancel onlining, otherwise a later onlining attempt will fail. Link: http://lkml.kernel.org/r/20180522100756.18478-2-david@redhat.com Fixes: fa69b5989bb0 ("mm/kasan: add support for memory hotplug") Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-26checkpatch: fix macro argument precedence testJoe Perches1-1/+1
checkpatch's macro argument precedence test is broken so fix it. Link: http://lkml.kernel.org/r/5dd900e9197febc1995604bb33c23c136d8b33ce.camel@perches.com Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-26init/main.c: include <linux/mem_encrypt.h>Mathieu Malaterre1-0/+1
In commit c7753208a94c ("x86, swiotlb: Add memory encryption support") a call to function `mem_encrypt_init' was added. Include prototype defined in header <linux/mem_encrypt.h> to prevent a warning reported during compilation with W=1: init/main.c:494:20: warning: no previous prototype for `mem_encrypt_init' [-Wmissing-prototypes] Link: http://lkml.kernel.org/r/20180522195533.31415-1-malat@debian.org Signed-off-by: Mathieu Malaterre <malat@debian.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Kees Cook <keescook@chromium.org> Cc: Laura Abbott <lauraa@codeaurora.org> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Gargi Sharma <gs051095@gmail.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-26kernel/sys.c: fix potential Spectre v1 issueGustavo A. R. Silva1-0/+5
`resource' can be controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability. This issue was detected with the help of Smatch: kernel/sys.c:1474 __do_compat_sys_old_getrlimit() warn: potential spectre issue 'get_current()->signal->rlim' (local cap) kernel/sys.c:1455 __do_sys_old_getrlimit() warn: potential spectre issue 'get_current()->signal->rlim' (local cap) Fix this by sanitizing *resource* before using it to index current->signal->rlim Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1]. [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2 Link: http://lkml.kernel.org/r/20180515030038.GA11822@embeddedor.com Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-26mm/memory_hotplug: fix leftover use of struct page during hotplugJonathan Cameron3-6/+9
The case of a new numa node got missed in avoiding using the node info from page_struct during hotplug. In this path we have a call to register_mem_sect_under_node (which allows us to specify it is hotplug so don't change the node), via link_mem_sections which unfortunately does not. Fix is to pass check_nid through link_mem_sections as well and disable it in the new numa node path. Note the bug only 'sometimes' manifests depending on what happens to be in the struct page structures - there are lots of them and it only needs to match one of them. The result of the bug is that (with a new memory only node) we never successfully call register_mem_sect_under_node so don't get the memory associated with the node in sysfs and meminfo for the node doesn't report it. It came up whilst testing some arm64 hotplug patches, but appears to be universal. Whilst I'm triggering it by removing then reinserting memory to a node with no other elements (thus making the node disappear then appear again), it appears it would happen on hotplugging memory where there was none before and it doesn't seem to be related the arm64 patches. These patches call __add_pages (where most of the issue was fixed by Pavel's patch). If there is a node at the time of the __add_pages call then all is well as it calls register_mem_sect_under_node from there with check_nid set to false. Without a node that function returns having not done the sysfs related stuff as there is no node to use. This is expected but it is the resulting path that fails... Exact path to the problem is as follows: mm/memory_hotplug.c: add_memory_resource() The node is not online so we enter the 'if (new_node)' twice, on the second such block there is a call to link_mem_sections which calls into drivers/node.c: link_mem_sections() which calls drivers/node.c: register_mem_sect_under_node() which calls get_nid_for_pfn and keeps trying until the output of that matches the expected node (passed all the way down from add_memory_resource) It is effectively the same fix as the one referred to in the fixes tag just in the code path for a new node where the comments point out we have to rerun the link creation because it will have failed in register_new_memory (as there was no node at the time). (actually that comment is wrong now as we don't have register_new_memory any more it got renamed to hotplug_memory_register in Pavel's patch). Link: http://lkml.kernel.org/r/20180504085311.1240-1-Jonathan.Cameron@huawei.com Fixes: fc44f7f9231a ("mm/memory_hotplug: don't read nid from struct page during hotplug") Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-26proc: fix smaps and meminfo alignmentHugh Dickins1-5/+0
The 4.17-rc /proc/meminfo and /proc/<pid>/smaps look ugly: single-digit numbers (commonly 0) are misaligned. Remove seq_put_decimal_ull_width()'s leftover optimization for single digits: it's wrong now that num_to_str() takes care of the width. Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1805241554210.1326@eggly.anvils Fixes: d1be35cb6f96 ("proc: add seq_put_decimal_ull_width to speed up /proc/pid/smaps") Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Andrei Vagin <avagin@openvz.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>