summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2019-07-24bpf: fix access to skb_shared_info->gso_segsEric Dumazet1-3/+3
It is possible we reach bpf_convert_ctx_access() with si->dst_reg == si->src_reg Therefore, we need to load BPF_REG_AX before eventually mangling si->src_reg. syzbot generated this x86 code : 3: 55 push %rbp 4: 48 89 e5 mov %rsp,%rbp 7: 48 81 ec 00 00 00 00 sub $0x0,%rsp // Might be avoided ? e: 53 push %rbx f: 41 55 push %r13 11: 41 56 push %r14 13: 41 57 push %r15 15: 6a 00 pushq $0x0 17: 31 c0 xor %eax,%eax 19: 48 8b bf c0 00 00 00 mov 0xc0(%rdi),%rdi 20: 44 8b 97 bc 00 00 00 mov 0xbc(%rdi),%r10d 27: 4c 01 d7 add %r10,%rdi 2a: 48 0f b7 7f 06 movzwq 0x6(%rdi),%rdi // Crash 2f: 5b pop %rbx 30: 41 5f pop %r15 32: 41 5e pop %r14 34: 41 5d pop %r13 36: 5b pop %rbx 37: c9 leaveq 38: c3 retq Fixes: d9ff286a0f59 ("bpf: allow BPF programs access skb_shared_info->gso_segs field") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-07-23bpf: fix narrower loads on s390Ilya Leoshkevich2-2/+15
The very first check in test_pkt_md_access is failing on s390, which happens because loading a part of a struct __sk_buff field produces an incorrect result. The preprocessed code of the check is: { __u8 tmp = *((volatile __u8 *)&skb->len + ((sizeof(skb->len) - sizeof(__u8)) / sizeof(__u8))); if (tmp != ((*(volatile __u32 *)&skb->len) & 0xFF)) return 2; }; clang generates the following code for it: 0: 71 21 00 03 00 00 00 00 r2 = *(u8 *)(r1 + 3) 1: 61 31 00 00 00 00 00 00 r3 = *(u32 *)(r1 + 0) 2: 57 30 00 00 00 00 00 ff r3 &= 255 3: 5d 23 00 1d 00 00 00 00 if r2 != r3 goto +29 <LBB0_10> Finally, verifier transforms it to: 0: (61) r2 = *(u32 *)(r1 +104) 1: (bc) w2 = w2 2: (74) w2 >>= 24 3: (bc) w2 = w2 4: (54) w2 &= 255 5: (bc) w2 = w2 The problem is that when verifier emits the code to replace a partial load of a struct __sk_buff field (*(u8 *)(r1 + 3)) with a full load of struct sk_buff field (*(u32 *)(r1 + 104)), an optional shift and a bitwise AND, it assumes that the machine is little endian and incorrectly decides to use a shift. Adjust shift count calculation to account for endianness. Fixes: 31fd85816dbe ("bpf: permits narrower load from bpf program context fields") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-07-22selftests/bpf: fix sendmsg6_prog on s390Ilya Leoshkevich1-2/+1
"sendmsg6: rewrite IP & port (C)" fails on s390, because the code in sendmsg_v6_prog() assumes that (ctx->user_ip6[0] & 0xFFFF) refers to leading IPv6 address digits, which is not the case on big-endian machines. Since checking bitwise operations doesn't seem to be the point of the test, replace two short comparisons with a single int comparison. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22libbpf: Avoid designated initializers for unnamed union membersArnaldo Carvalho de Melo1-7/+7
As it fails to build in some systems with: libbpf.c: In function 'perf_buffer__new': libbpf.c:4515: error: unknown field 'sample_period' specified in initializer libbpf.c:4516: error: unknown field 'wakeup_events' specified in initializer Doing as: attr.sample_period = 1; I.e. not as a designated initializer makes it build everywhere. Cc: Andrii Nakryiko <andriin@fb.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Fixes: fb84b8224655 ("libbpf: add perf buffer API") Link: https://lkml.kernel.org/n/tip-hnlmch8qit1ieksfppmr32si@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22libbpf: Fix endianness macro usage for some compilersArnaldo Carvalho de Melo2-4/+6
Using endian.h and its endianness macros makes this code build in a wider range of compilers, as some don't have those macros (__BYTE_ORDER__, __ORDER_LITTLE_ENDIAN__, __ORDER_BIG_ENDIAN__), so use instead endian.h's macros (__BYTE_ORDER, __LITTLE_ENDIAN, __BIG_ENDIAN) which makes this code even shorter :-) Acked-by: Andrii Nakryiko <andriin@fb.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Fixes: 12ef5634a855 ("libbpf: simplify endianness check") Fixes: e6c64855fd7a ("libbpf: add btf__parse_elf API to load .BTF and .BTF.ext") Link: https://lkml.kernel.org/n/tip-eep5n8vgwcdphw3uc058k03u@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22Merge branch 'bpf-sockmap-tls-fixes'Daniel Borkmann10-68/+419
Jakub Kicinski says: ==================== John says: Resolve a series of splats discovered by syzbot and an unhash TLS issue noted by Eric Dumazet. The main issues revolved around interaction between TLS and sockmap tear down. TLS and sockmap could both reset sk->prot ops creating a condition where a close or unhash op could be called forever. A rare race condition resulting from a missing rcu sync operation was causing a use after free. Then on the TLS side dropping the sock lock and re-acquiring it during the close op could hang. Finally, sockmap must be deployed before tls for current stack assumptions to be met. This is enforced now. A feature series can enable it. To fix this first refactor TLS code so the lock is held for the entire teardown operation. Then add an unhash callback to ensure TLS can not transition from ESTABLISHED to LISTEN state. This transition is a similar bug to the one found and fixed previously in sockmap. Then apply three fixes to sockmap to fix up races on tear down around map free and close. Finally, if sockmap is destroyed before TLS we add a new ULP op update to inform the TLS stack it should not call sockmap ops. This last one appears to be the most commonly found issue from syzbot. v4: - fix some use after frees; - disable disconnect work for offload (ctx lifetime is much more complex); - remove some of the dead code which made it hard to understand (for me) that things work correctly (e.g. the checks TLS is the top ULP); - add selftets. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22selftests/tls: add shutdown testsJakub Kicinski1-0/+27
Add test for killing the connection via shutdown. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22selftests/tls: close the socket with open recordJakub Kicinski1-0/+10
Add test which sends some data with MSG_MORE and then closes the socket (never calling send without MSG_MORE). This should make sure we clean up open records correctly. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22selftests/tls: add a bidirectional testJakub Kicinski1-0/+31
Add a simple test which installs the TLS state for both directions, sends and receives data on both sockets. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22selftests/tls: test error codes around TLS ULP installationJakub Kicinski1-0/+52
Test the error codes returned when TCP connection is not in ESTABLISHED state. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22selftests/tls: add a test for ULP but no keysJakub Kicinski1-0/+74
Make sure we test the TLS_BASE/TLS_BASE case both with data and the tear down/clean up path. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22bpf: sockmap/tls, close can race with map freeJohn Fastabend5-8/+53
When a map free is called and in parallel a socket is closed we have two paths that can potentially reset the socket prot ops, the bpf close() path and the map free path. This creates a problem with which prot ops should be used from the socket closed side. If the map_free side completes first then we want to call the original lowest level ops. However, if the tls path runs first we want to call the sockmap ops. Additionally there was no locking around prot updates in TLS code paths so the prot ops could be changed multiple times once from TLS path and again from sockmap side potentially leaving ops pointed at either TLS or sockmap when psock and/or tls context have already been destroyed. To fix this race first only update ops inside callback lock so that TLS, sockmap and lowest level all agree on prot state. Second and a ULP callback update() so that lower layers can inform the upper layer when they are being removed allowing the upper layer to reset prot ops. This gets us close to allowing sockmap and tls to be stacked in arbitrary order but will save that patch for *next trees. v4: - make sure we don't free things for device; - remove the checks which swap the callbacks back only if TLS is at the top. Reported-by: syzbot+06537213db7ba2745c4a@syzkaller.appspotmail.com Fixes: 02c558b2d5d6 ("bpf: sockmap, support for msg_peek in sk_msg with redirect ingress") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22bpf: sockmap, only create entry if ulp is not already enabledJohn Fastabend1-0/+3
Sockmap does not currently support adding sockets after TLS has been enabled. There never was a real use case for this so it was never added. But, we lost the test for ULP at some point so add it here and fail the socket insert if TLS is enabled. Future work could make sockmap support this use case but fixup the bug here. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22bpf: sockmap, synchronize_rcu before free'ing mapJohn Fastabend1-0/+2
We need to have a synchronize_rcu before free'ing the sockmap because any outstanding psock references will have a pointer to the map and when they use this could trigger a use after free. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22bpf: sockmap, sock_map_delete needs to use xchgJohn Fastabend1-5/+9
__sock_map_delete() may be called from a tcp event such as unhash or close from the following trace, tcp_bpf_close() tcp_bpf_remove() sk_psock_unlink() sock_map_delete_from_link() __sock_map_delete() In this case the sock lock is held but this only protects against duplicate removals on the TCP side. If the map is free'd then we have this trace, sock_map_free xchg() <- replaces map entry sock_map_unref() sk_psock_put() sock_map_del_link() The __sock_map_delete() call however uses a read, test, null over the map entry which can result in both paths trying to free the map entry. To fix use xchg in TCP paths as well so we avoid having two references to the same map entry. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22net/tls: fix transition through disconnect with closeJohn Fastabend3-1/+65
It is possible (via shutdown()) for TCP socks to go through TCP_CLOSE state via tcp_disconnect() without actually calling tcp_close which would then call the tls close callback. Because of this a user could disconnect a socket then put it in a LISTEN state which would break our assumptions about sockets always being ESTABLISHED state. More directly because close() can call unhash() and unhash is implemented by sockmap if a sockmap socket has TLS enabled we can incorrectly destroy the psock from unhash() and then call its close handler again. But because the psock (sockmap socket representation) is already destroyed we call close handler in sk->prot. However, in some cases (TLS BASE/BASE case) this will still point at the sockmap close handler resulting in a circular call and crash reported by syzbot. To fix both above issues implement the unhash() routine for TLS. v4: - add note about tls offload still needing the fix; - move sk_proto to the cold cache line; - split TX context free into "release" and "free", otherwise the GC work itself is in already freed memory; - more TX before RX for consistency; - reuse tls_ctx_free(); - schedule the GC work after we're done with context to avoid UAF; - don't set the unhash in all modes, all modes "inherit" TLS_BASE's callbacks anyway; - disable the unhash hook for TLS_HW. Fixes: 3c4d7559159bf ("tls: kernel TLS support") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22net/tls: remove sock unlock/lock around strp_done()John Fastabend4-45/+64
The tls close() callback currently drops the sock lock to call strp_done(). Split up the RX cleanup into stopping the strparser and releasing most resources, syncing strparser and finally freeing the context. To avoid the need for a strp_done() call on the cleanup path of device offload make sure we don't arm the strparser until we are sure init will be successful. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22net/tls: remove close callback sock unlock/lock around TX work flushJohn Fastabend3-7/+22
The tls close() callback currently drops the sock lock, makes a cancel_delayed_work_sync() call, and then relocks the sock. By restructuring the code we can avoid droping lock and then reclaiming it. To simplify this we do the following, tls_sk_proto_close set_bit(CLOSING) set_bit(SCHEDULE) cancel_delay_work_sync() <- cancel workqueue lock_sock(sk) ... release_sock(sk) strp_done() Setting the CLOSING bit prevents the SCHEDULE bit from being cleared by any workqueue items e.g. if one happens to be scheduled and run between when we set SCHEDULE bit and cancel work. Then because SCHEDULE bit is set now no new work will be scheduled. Tested with net selftests and bpf selftests. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22net/tls: don't call tls_sk_proto_close for hw record offloadJakub Kicinski1-4/+0
The deprecated TOE offload doesn't actually do anything in tls_sk_proto_close() - all TLS code is skipped and context not freed. Remove the callback to make it easier to refactor tls_sk_proto_close(). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22net/tls: don't arm strparser immediately in tls_set_sw_offload()Jakub Kicinski4-10/+19
In tls_set_device_offload_rx() we prepare the software context for RX fallback and proceed to add the connection to the device. Unfortunately, software context prep includes arming strparser so in case of a later error we have to release the socket lock to call strp_done(). In preparation for not releasing the socket lock half way through callbacks move arming strparser into a separate function. Following patches will make use of that. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-19libbpf: sanitize VAR to conservative 1-byte INTAndrii Nakryiko1-2/+7
If VAR in non-sanitized BTF was size less than 4, converting such VAR into an INT with size=4 will cause BTF validation failure due to violationg of STRUCT (into which DATASEC was converted) member size. Fix by conservatively using size=1. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-07-19libbpf: fix SIGSEGV when BTF loading fails, but .BTF.ext existsAndrii Nakryiko1-0/+6
In case when BTF loading fails despite sanitization, but BPF object has .BTF.ext loaded as well, we free and null obj->btf, but not obj->btf_ext. This leads to an attempt to relocate .BTF.ext later on during bpf_object__load(), which assumes obj->btf is present. This leads to SIGSEGV on null pointer access. Fix bug by freeing and nulling obj->btf_ext as well. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-07-19tcp: fix tcp_set_congestion_control() use from bpf hookEric Dumazet4-6/+9
Neal reported incorrect use of ns_capable() from bpf hook. bpf_setsockopt(...TCP_CONGESTION...) -> tcp_set_congestion_control() -> ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN) -> ns_capable_common() -> current_cred() -> rcu_dereference_protected(current->cred, 1) Accessing 'current' in bpf context makes no sense, since packets are processed from softirq context. As Neal stated : The capability check in tcp_set_congestion_control() was written assuming a system call context, and then was reused from a BPF call site. The fix is to add a new parameter to tcp_set_congestion_control(), so that the ns_capable() call is only performed under the right context. Fixes: 91b5b21c7c16 ("bpf: Add support for changing congestion control") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Lawrence Brakmo <brakmo@fb.com> Reported-by: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-19ag71xx: fix return value check in ag71xx_probe()Wei Yongjun1-2/+2
In case of error, the function of_get_mac_address() returns ERR_PTR() and never returns NULL. The NULL test in the return value check should be replaced with IS_ERR(). Fixes: d51b6ce441d3 ("net: ethernet: add ag71xx driver") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-19ag71xx: fix error return code in ag71xx_probe()Wei Yongjun1-1/+3
Fix to return error code -ENOMEM from the dmam_alloc_coherent() error handling case instead of 0, as done elsewhere in this function. Fixes: d51b6ce441d3 ("net: ethernet: add ag71xx driver") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-19usb: qmi_wwan: add D-Link DWM-222 A2 device IDRogan Dawes1-0/+1
Signed-off-by: Rogan Dawes <rogan@dawes.za.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-19bnxt_en: Fix VNIC accounting when enabling aRFS on 57500 chips.Michael Chan1-2/+5
Unlike legacy chips, 57500 chips don't need additional VNIC resources for aRFS/ntuple. Fix the code accordingly so that we don't reserve and allocate additional VNICs on 57500 chips. Without this patch, the driver is failing to initialize when it tries to allocate extra VNICs. Fixes: ac33906c67e2 ("bnxt_en: Add support for aRFS on 57500 chips.") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-19net: dsa: sja1105: Fix missing unlock on error in sk_buff()Wei Yongjun1-0/+1
Add the missing unlock before return from function sk_buff() in the error handling case. Fixes: f3097be21bf1 ("net: dsa: sja1105: Add a state machine for RX timestamping") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-19gve: replace kfree with kvfreeChuhong Yuan2-13/+13
Variables allocated by kvzalloc should not be freed by kfree. Because they may be allocated by vmalloc. So we replace kfree with kvfree here. Signed-off-by: Chuhong Yuan <hslester96@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller32-195/+391
Alexei Starovoitov says: ==================== pull-request: bpf 2019-07-18 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) verifier precision propagation fix, from Andrii. 2) BTF size fix for typedefs, from Andrii. 3) a bunch of big endian fixes, from Ilya. 4) wide load from bpf_sock_addr fixes, from Stanislav. 5) a bunch of misc fixes from a number of developers. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-18selftests/bpf: fix test_xdp_noinline on s390Ilya Leoshkevich1-8/+9
test_xdp_noinline fails on s390 due to a handful of endianness issues. Use ntohs for parsing eth_proto. Replace bswaps with ntohs/htons. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-07-18selftests/bpf: fix "valid read map access into a read-only array 1" on s390Ilya Leoshkevich1-1/+1
This test looks up a 32-bit map element and then loads it using a 64-bit load. This does not work on s390, which is a big-endian machine. Since the point of this test doesn't seem to be loading a smaller value using a larger load, simply use a 32-bit load. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-07-18net/mlx5: Replace kfree with kvfreeChuhong Yuan1-1/+1
Variable allocated by kvmalloc should not be freed by kfree. Because it may be allocated by vmalloc. So replace kfree with kvfree here. Fixes: 9b1f298236057 ("net/mlx5: Add support for FW fatal reporter dump") Signed-off-by: Chuhong Yuan <hslester96@gmail.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-18MAINTAINERS: update netsec driverIlias Apalodimas1-0/+1
Add myself to maintainers since i provided the XDP and page_pool implementation Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Acked-by: Jassi Brar <jaswinder.singh@linaro.org> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-18ipv6: Unlink sibling route in case of failureIdo Schimmel1-1/+17
When a route needs to be appended to an existing multipath route, fib6_add_rt2node() first appends it to the siblings list and increments the number of sibling routes on each sibling. Later, the function notifies the route via call_fib6_entry_notifiers(). In case the notification is vetoed, the route is not unlinked from the siblings list, which can result in a use-after-free. Fix this by unlinking the route from the siblings list before returning an error. Audited the rest of the call sites from which the FIB notification chain is called and could not find more problems. Fixes: 2233000cba40 ("net/ipv6: Move call_fib6_entry_notifiers up for route adds") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Alexander Petrovskiy <alexpe@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-18Merge tag 'wireless-drivers-for-davem-2019-07-18' of ↵David S. Miller6-6/+93
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers Kalle Valo says: ==================== wireless-drivers fixes for 5.3 First set of fixes for 5.3. iwlwifi * add new cards for 9000 and 20000 series and qu c-step devices ath10k * workaround an uninitialised variable warning rt2x00 * fix rx queue hand on USB ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-18liquidio: Replace vmalloc + memset with vzallocChuhong Yuan1-4/+2
Use vzalloc and vzalloc_node instead of using vmalloc and vmalloc_node and then zeroing the allocated memory by memset 0. This simplifies the code. Signed-off-by: Chuhong Yuan <hslester96@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-18udp: Fix typo in net/ipv4/udp.cSu Yanjun1-1/+1
Signed-off-by: Su Yanjun <suyj.fnst@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-18net: bcmgenet: use promisc for unsupported filtersJustin Chen1-31/+26
Currently we silently ignore filters if we cannot meet the filter requirements. This will lead to the MAC dropping packets that are expected to pass. A better solution would be to set the NIC to promisc mode when the required filters cannot be met. Also correct the number of MDF filters supported. It should be 17, not 16. Signed-off-by: Justin Chen <justinpopo6@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-18ipv6: rt6_check should return NULL if 'from' is NULLDavid Ahern1-1/+1
Paul reported that l2tp sessions were broken after the commit referenced in the Fixes tag. Prior to this commit rt6_check returned NULL if the rt6_info 'from' was NULL - ie., the dst_entry was disconnected from a FIB entry. Restore that behavior. Fixes: 93531c674315 ("net/ipv6: separate handling of FIB entries from dst based routes") Reported-by: Paul Donohue <linux-kernel@PaulSD.com> Tested-by: Paul Donohue <linux-kernel@PaulSD.com> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-18tipc: initialize 'validated' field of received packetsJon Maloy1-0/+1
The tipc_msg_validate() function leaves a boolean flag 'validated' in the validated buffer's control block, to avoid performing this action more than once. However, at reception of new packets, the position of this field may already have been set by lower layer protocols, so that the packet is erroneously perceived as already validated by TIPC. We fix this by initializing the said field to 'false' before performing the initial validation. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-18Merge branch 'ipv4-relax-source-validation-check-for-loopback-packets'David S. Miller2-1/+39
Cong Wang says: ==================== ipv4: relax source validation check for loopback packets This patchset fixes a corner case when loopback packets get dropped by rp_filter when we route them from veth to lo. Patch 1 is the fix and patch 2 provides a simplified test case for this scenario. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-18selftests: add a test case for rp_filterCong Wang1-1/+34
Add a test case to simulate the loopback packet case fixed in the previous patch. This test gets passed after the fix: IPv4 rp_filter tests TEST: rp_filter passes local packets [ OK ] TEST: rp_filter passes loopback packets [ OK ] Cc: David Ahern <dsahern@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-18fib: relax source validation check for loopback packetsCong Wang1-0/+5
In a rare case where we redirect local packets from veth to lo, these packets fail to pass the source validation when rp_filter is turned on, as the tracing shows: <...>-311708 [040] ..s1 7951180.957825: fib_table_lookup: table 254 oif 0 iif 1 src 10.53.180.130 dst 10.53.180.130 tos 0 scope 0 flags 0 <...>-311708 [040] ..s1 7951180.957826: fib_table_lookup_nh: nexthop dev eth0 oif 4 src 10.53.180.130 So, the fib table lookup returns eth0 as the nexthop even though the packets are local and should be routed to loopback nonetheless, but they can't pass the dev match check in fib_info_nh_uses_dev() without this patch. It should be safe to relax this check for this special case, as normally packets coming out of loopback device still have skb_dst so they won't even hit this slow path. Cc: Julian Anastasov <ja@ssi.bg> Cc: David Ahern <dsahern@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-18Merge branch 'mlxsw-Two-fixes'David S. Miller4-8/+25
Ido Schimmel says: ==================== mlxsw: Two fixes This patchset contains two fixes for mlxsw. Patch #1 from Petr fixes an issue in which DSCP rewrite can occur even if the egress port was switched to Trust L2 mode where priority mapping is based on PCP. Patch #2 fixes a problem where packets can be learned on a non-existing FID if a tc filter with a redirect action is configured on a bridged port. The problem and fix are explained in detail in the commit message. Please consider both patches for 5.2.y ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-18mlxsw: spectrum: Do not process learned records with a dummy FIDIdo Schimmel3-0/+17
The switch periodically sends notifications about learned FDB entries. Among other things, the notification includes the FID (Filtering Identifier) and the port on which the MAC was learned. In case the driver does not have the FID defined on the relevant port, the following error will be periodically generated: mlxsw_spectrum2 0000:06:00.0 swp32: Failed to find a matching {Port, VID} following FDB notification This is not supposed to happen under normal conditions, but can happen if an ingress tc filter with a redirect action is installed on a bridged port. The redirect action will cause the packet's FID to be changed to the dummy FID and a learning notification will be emitted with this FID - which is not defined on the bridged port. Fix this by having the driver ignore learning notifications generated with the dummy FID and delete them from the device. Another option is to chain an ignore action after the redirect action which will cause the device to disable learning, but this means that we need to consume another action whenever a redirect action is used. In addition, the scenario described above is merely a corner case. Fixes: cedbb8b25948 ("mlxsw: spectrum_flower: Set dummy FID before forward action") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Alex Kushnarov <alexanderk@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Tested-by: Alex Kushnarov <alexanderk@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-18mlxsw: spectrum_dcb: Configure DSCP map as the last rule is removedPetr Machata1-8/+8
Spectrum systems use DSCP rewrite map to update DSCP field in egressing packets to correspond to priority that the packet has. Whether rewriting will take place is determined at the point when the packet ingresses the switch: if the port is in Trust L3 mode, packet priority is determined from the DSCP map at the port, and DSCP rewrite will happen. If the port is in Trust L2 mode, 802.1p is used for packet prioritization, and no DSCP rewrite will happen. The driver determines the port trust mode based on whether any DSCP prioritization rules are in effect at given port. If there are any, trust level is L3, otherwise it's L2. When the last DSCP rule is removed, the port is switched to trust L2. Under that scenario, if DSCP of a packet should be rewritten, it should be rewritten to 0. However, when switching to Trust L2, the driver neglects to also update the DSCP rewrite map. The last DSCP rule thus remains in effect, and packets egressing through this port, if they have the right priority, will have their DSCP set according to this rule. Fix by first configuring the rewrite map, and only then switching to trust L2 and bailing out. Fixes: b2b1dab6884e ("mlxsw: spectrum: Support ieee_setapp, ieee_delapp") Signed-off-by: Petr Machata <petrm@mellanox.com> Reported-by: Alex Veber <alexve@mellanox.com> Tested-by: Alex Veber <alexve@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-18net: ag71xx: Add missing headerRosen Penev1-0/+1
ag71xx uses devm_ioremap_nocache. This fixes usage of an implicit function Fixes: d51b6ce441d3 ("net: ethernet: add ag71xx driver") Signed-off-by: Rosen Penev <rosenp@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-17net_sched: unset TCQ_F_CAN_BYPASS when adding filtersCong Wang3-4/+1
For qdisc's that support TC filters and set TCQ_F_CAN_BYPASS, notably fq_codel, it makes no sense to let packets bypass the TC filters we setup in any scenario, otherwise our packets steering policy could not be enforced. This can be reproduced easily with the following script: ip li add dev dummy0 type dummy ifconfig dummy0 up tc qd add dev dummy0 root fq_codel tc filter add dev dummy0 parent 8001: protocol arp basic action mirred egress redirect dev lo tc filter add dev dummy0 parent 8001: protocol ip basic action mirred egress redirect dev lo ping -I dummy0 192.168.112.1 Without this patch, packets are sent directly to dummy0 without hitting any of the filters. With this patch, packets are redirected to loopback as expected. This fix is not perfect, it only unsets the flag but does not set it back because we have to save the information somewhere in the qdisc if we really want that. Note, both fq_codel and sfq clear this flag in their ->bind_tcf() but this is clearly not sufficient when we don't use any class ID. Fixes: 23624935e0c4 ("net_sched: TCQ_F_CAN_BYPASS generalization") Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-17Merge branch 'net-rds-RDMA-fixes'David S. Miller5-49/+109
Gerd Rausch says: ==================== net/rds: RDMA fixes A number of net/rds fixes necessary to make "rds_rdma.ko" pass some basic Oracle internal tests. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>