summaryrefslogtreecommitdiff
path: root/net/ipv6
AgeCommit message (Collapse)AuthorFilesLines
2018-07-28net/ipv6: Fix linklocal to global address with VRFDavid Ahern2-4/+7
[ Upstream commit 24b711edfc34bc45777a3f068812b7d1ed004a5d ] Example setup: host: ip -6 addr add dev eth1 2001:db8:104::4 where eth1 is enslaved to a VRF switch: ip -6 ro add 2001:db8:104::4/128 dev br1 where br1 only has an LLA ping6 2001:db8:104::4 ssh 2001:db8:104::4 (NOTE: UDP works fine if the PKTINFO has the address set to the global address and ifindex is set to the index of eth1 with a destination an LLA). For ICMP, icmp6_iif needs to be updated to check if skb->dev is an L3 master. If it is then return the ifindex from rt6i_idev similar to what is done for loopback. For TCP, restore the original tcp_v6_iif definition which is needed in most places and add a new tcp_v6_iif_l3_slave that considers the l3_slave variability. This latter check is only needed for socket lookups. Fixes: 9ff74384600a ("net: vrf: Handle ipv6 multicast and link-local addresses") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-28multicast: do not restore deleted record source filter mode to new oneHangbin Liu1-2/+1
There are two scenarios that we will restore deleted records. The first is when device down and up(or unmap/remap). In this scenario the new filter mode is same with previous one. Because we get it from in_dev->mc_list and we do not touch it during device down and up. The other scenario is when a new socket join a group which was just delete and not finish sending status reports. In this scenario, we should use the current filter mode instead of restore old one. Here are 4 cases in total. old_socket new_socket before_fix after_fix IN(A) IN(A) ALLOW(A) ALLOW(A) IN(A) EX( ) TO_IN( ) TO_EX( ) EX( ) IN(A) TO_EX( ) ALLOW(A) EX( ) EX( ) TO_EX( ) TO_EX( ) Fixes: 24803f38a5c0b (igmp: do not remove igmp souce list info when set link down) Fixes: 1666d49e1d416 (mld: do not remove mld souce list info when set link down) Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-28ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pullWillem de Bruijn1-2/+5
[ Upstream commit 2efd4fca703a6707cad16ab486eaab8fc7f0fd49 ] Syzbot reported a read beyond the end of the skb head when returning IPV6_ORIGDSTADDR: BUG: KMSAN: kernel-infoleak in put_cmsg+0x5ef/0x860 net/core/scm.c:242 CPU: 0 PID: 4501 Comm: syz-executor128 Not tainted 4.17.0+ #9 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x185/0x1d0 lib/dump_stack.c:113 kmsan_report+0x188/0x2a0 mm/kmsan/kmsan.c:1125 kmsan_internal_check_memory+0x138/0x1f0 mm/kmsan/kmsan.c:1219 kmsan_copy_to_user+0x7a/0x160 mm/kmsan/kmsan.c:1261 copy_to_user include/linux/uaccess.h:184 [inline] put_cmsg+0x5ef/0x860 net/core/scm.c:242 ip6_datagram_recv_specific_ctl+0x1cf3/0x1eb0 net/ipv6/datagram.c:719 ip6_datagram_recv_ctl+0x41c/0x450 net/ipv6/datagram.c:733 rawv6_recvmsg+0x10fb/0x1460 net/ipv6/raw.c:521 [..] This logic and its ipv4 counterpart read the destination port from the packet at skb_transport_offset(skb) + 4. With MSG_MORE and a local SOCK_RAW sender, syzbot was able to cook a packet that stores headers exactly up to skb_transport_offset(skb) in the head and the remainder in a frag. Call pskb_may_pull before accessing the pointer to ensure that it lies in skb head. Link: http://lkml.kernel.org/r/CAF=yD-LEJwZj5a1-bAAj2Oy_hKmGygV6rsJ_WOrAYnv-fnayiQ@mail.gmail.com Reported-by: syzbot+9adb4b567003cac781f0@syzkaller.appspotmail.com Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-28ip: hash fragments consistentlyPaolo Abeni1-0/+2
[ Upstream commit 3dd1c9a1270736029ffca670e9bd0265f4120600 ] The skb hash for locally generated ip[v6] fragments belonging to the same datagram can vary in several circumstances: * for connected UDP[v6] sockets, the first fragment get its hash via set_owner_w()/skb_set_hash_from_sk() * for unconnected IPv6 UDPv6 sockets, the first fragment can get its hash via ip6_make_flowlabel()/skb_get_hash_flowi6(), if auto_flowlabel is enabled For the following frags the hash is usually computed via skb_get_hash(). The above can cause OoO for unconnected IPv6 UDPv6 socket: in that scenario the egress tx queue can be selected on a per packet basis via the skb hash. It may also fool flow-oriented schedulers to place fragments belonging to the same datagram in different flows. Fix the issue by copying the skb hash from the head frag into the others at fragmentation time. Before this commit: perf probe -a "dev_queue_xmit skb skb->hash skb->l4_hash:b1@0/8 skb->sw_hash:b1@1/8" netperf -H $IPV4 -t UDP_STREAM -l 5 -- -m 2000 -n & perf record -e probe:dev_queue_xmit -e probe:skb_set_owner_w -a sleep 0.1 perf script probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=3713014309 l4_hash=1 sw_hash=0 probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=0 l4_hash=0 sw_hash=0 After this commit: probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=2171763177 l4_hash=1 sw_hash=0 probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=2171763177 l4_hash=1 sw_hash=0 Fixes: b73c3d0e4f0e ("net: Save TX flow hash in sock and set in skbuf on xmit") Fixes: 67800f9b1f4e ("ipv6: Call skb_get_hash_flowi6 to get skb->hash in ip6_make_flowlabel") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-25ipv6: make DAD fail with enhanced DAD when nonce length differsSabrina Dubroca1-1/+1
[ Upstream commit e66515999b627368892ccc9b3a13a506f2ea1357 ] Commit adc176c54722 ("ipv6 addrconf: Implemented enhanced DAD (RFC7527)") added enhanced DAD with a nonce length of 6 bytes. However, RFC7527 doesn't specify the length of the nonce, other than being 6 + 8*k bytes, with integer k >= 0 (RFC3971 5.3.2). The current implementation simply assumes that the nonce will always be 6 bytes, but others systems are free to choose different sizes. If another system sends a nonce of different length but with the same 6 bytes prefix, it shouldn't be considered as the same nonce. Thus, check that the length of the received nonce is the same as the length we sent. Ugly scapy test script running on veth0: def loop(): pkt=sniff(iface="veth0", filter="icmp6", count=1) pkt = pkt[0] b = bytearray(pkt[Raw].load) b[1] += 1 b += b'\xde\xad\xbe\xef\xde\xad\xbe\xef' pkt[Raw].load = bytes(b) pkt[IPv6].plen += 8 # fixup checksum after modifying the payload pkt[IPv6].payload.cksum -= 0x3b44 if pkt[IPv6].payload.cksum < 0: pkt[IPv6].payload.cksum += 0xffff sendp(pkt, iface="veth0") This should result in DAD failure for any address added to veth0's peer, but is currently ignored. Fixes: adc176c54722 ("ipv6 addrconf: Implemented enhanced DAD (RFC7527)") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-25net: ip6_gre: get ipv6hdr after skb_cow_head()Prashant Bhole1-1/+2
[ Upstream commit b7ed879425be371905d856410d19e9a42a62bcf3 ] A KASAN:use-after-free bug was found related to ip6-erspan while running selftests/net/ip6_gre_headroom.sh It happens because of following sequence: - ipv6hdr pointer is obtained from skb - skb_cow_head() is called, skb->head memory is reallocated - old data is accessed using ipv6hdr pointer skb_cow_head() call was added in e41c7c68ea77 ("ip6erspan: make sure enough headroom at xmit."), but looking at the history there was a chance of similar bug because gre_handle_offloads() and pskb_trim() can also reallocate skb->head memory. Fixes tag points to commit which introduced possibility of this bug. This patch moves ipv6hdr pointer assignment after skb_cow_head() call. Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support") Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-25net/ipv6: Do not allow device only routes via the multipath APIDavid Ahern1-0/+7
[ Upstream commit b5d2d75e079a918be686957b1a8d2f6c5cc95a0a ] Eric reported that reverting the patch that fixed and simplified IPv6 multipath routes means reverting back to invalid userspace notifications. eg., $ ip -6 route add 2001:db8:1::/64 nexthop dev eth0 nexthop dev eth1 only generates a single notification: 2001:db8:1::/64 dev eth0 metric 1024 pref medium While working on a fix for this problem I found another case that is just broken completely - a multipath route with a gateway followed by device followed by gateway: $ ip -6 ro add 2001:db8:103::/64 nexthop via 2001:db8:1::64 nexthop dev dummy2 nexthop via 2001:db8:3::64 In this case the device only route is dropped completely - no notification to userpsace but no addition to the FIB either: $ ip -6 ro ls 2001:db8:1::/64 dev dummy1 proto kernel metric 256 pref medium 2001:db8:2::/64 dev dummy2 proto kernel metric 256 pref medium 2001:db8:3::/64 dev dummy3 proto kernel metric 256 pref medium 2001:db8:103::/64 metric 1024 nexthop via 2001:db8:1::64 dev dummy1 weight 1 nexthop via 2001:db8:3::64 dev dummy3 weight 1 pref medium fe80::/64 dev dummy1 proto kernel metric 256 pref medium fe80::/64 dev dummy2 proto kernel metric 256 pref medium fe80::/64 dev dummy3 proto kernel metric 256 pref medium Really, IPv6 multipath is just FUBAR'ed beyond repair when it comes to device only routes, so do not allow it all. This change will break any scripts relying on the mpath api for insert, but I don't see any other way to handle the permutations. Besides, since the routes are added to the FIB as standalone (non-multipath) routes the kernel is not doing what the user requested, so it might as well tell the user that. Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-25ipv6: ila: select CONFIG_DST_CACHEArnd Bergmann1-0/+1
[ Upstream commit 83ed7d1fe2d2d4a11b30660dec20168bb473d9c1 ] My randconfig builds came across an old missing dependency for ILA: ERROR: "dst_cache_set_ip6" [net/ipv6/ila/ila.ko] undefined! ERROR: "dst_cache_get" [net/ipv6/ila/ila.ko] undefined! ERROR: "dst_cache_init" [net/ipv6/ila/ila.ko] undefined! ERROR: "dst_cache_destroy" [net/ipv6/ila/ila.ko] undefined! We almost never run into this by accident because randconfig builds end up selecting DST_CACHE from some other tunnel protocol, and this one appears to be the only one missing the explicit 'select'. >From all I can tell, this problem first appeared in linux-4.9 when dst_cache support got added to ILA. Fixes: 79ff2fc31e0f ("ila: Cache a route to translated address") Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-22netfilter: ipv6: nf_defrag: drop skb dst before queueingFlorian Westphal1-0/+2
commit 84379c9afe011020e797e3f50a662b08a6355dcf upstream. Eric Dumazet reports: Here is a reproducer of an annoying bug detected by syzkaller on our production kernel [..] ./b78305423 enable_conntrack Then : sleep 60 dmesg | tail -10 [ 171.599093] unregister_netdevice: waiting for lo to become free. Usage count = 2 [ 181.631024] unregister_netdevice: waiting for lo to become free. Usage count = 2 [ 191.687076] unregister_netdevice: waiting for lo to become free. Usage count = 2 [ 201.703037] unregister_netdevice: waiting for lo to become free. Usage count = 2 [ 211.711072] unregister_netdevice: waiting for lo to become free. Usage count = 2 [ 221.959070] unregister_netdevice: waiting for lo to become free. Usage count = 2 Reproducer sends ipv6 fragment that hits nfct defrag via LOCAL_OUT hook. skb gets queued until frag timer expiry -- 1 minute. Normally nf_conntrack_reasm gets called during prerouting, so skb has no dst yet which might explain why this wasn't spotted earlier. Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Reported-by: John Sperbeck <jsperbeck@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Tested-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-22net/tcp: Fix socket lookups with SO_BINDTODEVICEDavid Ahern1-2/+2
[ Upstream commit 8c43bd1706885ba1acfa88da02bc60a2ec16f68c ] Similar to 69678bcd4d2d ("udp: fix SO_BINDTODEVICE"), TCP socket lookups need to fail if dev_match is not true. Currently, a packet to a given port can match a socket bound to device when it should not. In the VRF case, this causes the lookup to hit a VRF socket and not a global socket resulting in a response trying to go through the VRF when it should not. Fixes: 3fa6f616a7a4d ("net: ipv4: add second dif to inet socket lookups") Fixes: 4297a0ef08572 ("net: ipv6: add second dif to inet6 socket lookups") Reported-by: Lou Berger <lberger@labn.net> Diagnosed-by: Renato Westphal <renato@opensourcerouting.org> Tested-by: Renato Westphal <renato@opensourcerouting.org> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-22ipv6: sr: fix passing wrong flags to crypto_alloc_shash()Eric Biggers1-1/+1
[ Upstream commit fc9c2029e37c3ae9efc28bf47045e0b87e09660c ] The 'mask' argument to crypto_alloc_shash() uses the CRYPTO_ALG_* flags, not 'gfp_t'. So don't pass GFP_KERNEL to it. Fixes: bf355b8d2c30 ("ipv6: sr: add core files for SR HMAC support") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-17netfilter: x_tables: initialise match/target check parameter structFlorian Westphal1-0/+1
commit c568503ef02030f169c9e19204def610a3510918 upstream. syzbot reports following splat: BUG: KMSAN: uninit-value in ebt_stp_mt_check+0x24b/0x450 net/bridge/netfilter/ebt_stp.c:162 ebt_stp_mt_check+0x24b/0x450 net/bridge/netfilter/ebt_stp.c:162 xt_check_match+0x1438/0x1650 net/netfilter/x_tables.c:506 ebt_check_match net/bridge/netfilter/ebtables.c:372 [inline] ebt_check_entry net/bridge/netfilter/ebtables.c:702 [inline] The uninitialised access is xt_mtchk_param->nft_compat ... which should be set to 0. Fix it by zeroing the struct beforehand, same for tgchk. ip(6)tables targetinfo uses c99-style initialiser, so no change needed there. Reported-by: syzbot+da4494182233c23a5fcf@syzkaller.appspotmail.com Fixes: 55917a21d0cc0 ("netfilter: x_tables: add context to know if extension runs from nft_compat") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-08netfilter: ip6t_rpfilter: provide input interface for route lookupVincent Bernat1-0/+2
commit cede24d1b21d68d84ac5a36c44f7d37daadcc258 upstream. In commit 47b7e7f82802, this bit was removed at the same time the RT6_LOOKUP_F_IFACE flag was removed. However, it is needed when link-local addresses are used, which is a very common case: when packets are routed, neighbor solicitations are done using link-local addresses. For example, the following neighbor solicitation is not matched by "-m rpfilter": IP6 fe80::5254:33ff:fe00:1 > ff02::1:ff00:3: ICMP6, neighbor solicitation, who has 2001:db8::5254:33ff:fe00:3, length 32 Commit 47b7e7f82802 doesn't quite explain why we shouldn't use RT6_LOOKUP_F_IFACE in the rpfilter case. I suppose the interface check later in the function would make it redundant. However, the remaining of the routing code is using RT6_LOOKUP_F_IFACE when there is no source address (which matches rpfilter's case with a non-unicast destination, like with neighbor solicitation). Signed-off-by: Vincent Bernat <vincent@bernat.im> Fixes: 47b7e7f82802 ("netfilter: don't set F_IFACE on ipv6 fib lookups") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26udp: fix rx queue len reported by diag and proc interfacePaolo Abeni2-4/+5
[ Upstream commit 6c206b20092a3623184cff9470dba75d21507874 ] After commit 6b229cf77d68 ("udp: add batching to udp_rmem_release()") the sk_rmem_alloc field does not measure exactly anymore the receive queue length, because we batch the rmem release. The issue is really apparent only after commit 0d4a6608f68c ("udp: do rmem bulk free even if the rx sk queue is empty"): the user space can easily check for an empty socket with not-0 queue length reported by the 'ss' tool or the procfs interface. We need to use a custom UDP helper to report the correct queue length, taking into account the forward allocation deficit. Reported-by: trevor.francis@46labs.com Fixes: 6b229cf77d68 ("UDP: add batching to udp_rmem_release()") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26tcp: verify the checksum of the first data segment in a new connectionFrank van der Linden1-0/+4
[ Upstream commit 4fd44a98ffe0d048246efef67ed640fdf2098a62 ] commit 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table") introduced an optimization for the handling of child sockets created for a new TCP connection. But this optimization passes any data associated with the last ACK of the connection handshake up the stack without verifying its checksum, because it calls tcp_child_process(), which in turn calls tcp_rcv_state_process() directly. These lower-level processing functions do not do any checksum verification. Insert a tcp_checksum_complete call in the TCP_NEW_SYN_RECEIVE path to fix this. Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table") Signed-off-by: Frank van der Linden <fllinden@amazon.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Balbir Singh <bsingharora@gmail.com> Reviewed-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26ipv6: allow PMTU exceptions to local routesJulian Anastasov1-3/+0
[ Upstream commit 0975764684487bf3f7a47eef009e750ea41bd514 ] IPVS setups with local client and remote tunnel server need to create exception for the local virtual IP. What we do is to change PMTU from 64KB (on "lo") to 1460 in the common case. Suggested-by: Martin KaFai Lau <kafai@fb.com> Fixes: 45e4fd26683c ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception") Fixes: 7343ff31ebf0 ("ipv6: Don't create clones of host routes.") Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: David Ahern <dsahern@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-11ipmr: fix error path when ipmr_new_table failsSabrina Dubroca1-6/+12
[ Upstream commit e783bb00ad86d9d1f01d9d3a750713070036358e ] commit 0bbbf0e7d0e7 ("ipmr, ip6mr: Unite creation of new mr_table") refactored ipmr_new_table, so that it now returns NULL when mr_table_alloc fails. Unfortunately, all callers of ipmr_new_table expect an ERR_PTR. This can result in NULL deref, for example when ipmr_rules_exit calls ipmr_free_table with NULL net->ipv4.mrt in the !CONFIG_IP_MROUTE_MULTIPLE_TABLES version. This patch makes mr_table_alloc return errors, and changes ip6mr_new_table and its callers to return/expect error pointers as well. It also removes the version of mr_table_alloc defined under !CONFIG_IP_MROUTE_COMMON, since it is never used. Fixes: 0bbbf0e7d0e7 ("ipmr, ip6mr: Unite creation of new mr_table") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-11vrf: check the original netdevice for generating redirectStephen Suryaputra2-1/+8
[ Upstream commit 2f17becfbea5e9a0529b51da7345783e96e69516 ] Use the right device to determine if redirect should be sent especially when using vrf. Same as well as when sending the redirect. Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-11ipv6: omit traffic class when calculating flow hashMichal Kubecek1-2/+2
[ Upstream commit fa1be7e01ea863e911349e30456706749518eeab ] Some of the code paths calculating flow hash for IPv6 use flowlabel member of struct flowi6 which, despite its name, encodes both flow label and traffic class. If traffic class changes within a TCP connection (as e.g. ssh does), ECMP route can switch between path. It's also inconsistent with other code paths where ip6_flowlabel() (returning only flow label) is used to feed the key. Use only flow label everywhere, including one place where hash key is set using ip6_flowinfo(). Fixes: 51ebd3181572 ("ipv6: add support of equal cost multipath (ECMP)") Fixes: f70ea018da06 ("net: Add functions to get skb->hash based on flow structures") Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-11ip6mr: only set ip6mr_table from setsockopt when ip6mr_new_table succeedsSabrina Dubroca1-1/+2
[ Upstream commit 848235edb5c93ed086700584c8ff64f6d7fc778d ] Currently, raw6_sk(sk)->ip6mr_table is set unconditionally during ip6_mroute_setsockopt(MRT6_TABLE). A subsequent attempt at the same setsockopt will fail with -ENOENT, since we haven't actually created that table. A similar fix for ipv4 was included in commit 5e1859fbcc3c ("ipv4: ipmr: various fixes and cleanups"). Fixes: d1db275dd3f6 ("ipv6: ip6mr: support multiple tables") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-11netfilter: nf_flow_table: attach dst to skbsJason A. Donenfeld1-0/+1
commit 2a79fd3908acd88e6cb0e620c314d7b1fee56a02 upstream. Some drivers, such as vxlan and wireguard, use the skb's dst in order to determine things like PMTU. They therefore loose functionality when flow offloading is enabled. So, we ensure the skb has it before xmit'ing it in the offloading path. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-01ip6_tunnel: remove magic mtu value 0xFFF8Nicolas Dichtel2-5/+11
I don't know where this value comes from (probably a copy and paste and paste and paste ...). Let's use standard values which are a bit greater. Link: https://git.kernel.org/pub/scm/linux/kernel/git/davem/netdev-vger-cvs.git/commit/?id=e5afd356a411a Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-01Merge branch 'master' of ↵David S. Miller1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2018-05-31 1) Avoid possible overflow of the offset variable in _decode_session6(), this fixes an infinite lookp there. From Eric Dumazet. 2) We may use an error pointer in the error path of xfrm_bundle_create(). Fix this by returning this pointer directly to the caller. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-29ipv6: sr: fix memory OOB access in seg6_do_srh_encap/inlineMathieu Xhonneux1-2/+2
seg6_do_srh_encap and seg6_do_srh_inline can possibly do an out-of-bounds access when adding the SRH to the packet. This no longer happen when expanding the skb not only by the size of the SRH (+ outer IPv6 header), but also by skb->mac_len. [ 53.793056] BUG: KASAN: use-after-free in seg6_do_srh_encap+0x284/0x620 [ 53.794564] Write of size 14 at addr ffff88011975ecfa by task ping/674 [ 53.796665] CPU: 0 PID: 674 Comm: ping Not tainted 4.17.0-rc3-ARCH+ #90 [ 53.796670] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014 [ 53.796673] Call Trace: [ 53.796679] <IRQ> [ 53.796689] dump_stack+0x71/0xab [ 53.796700] print_address_description+0x6a/0x270 [ 53.796707] kasan_report+0x258/0x380 [ 53.796715] ? seg6_do_srh_encap+0x284/0x620 [ 53.796722] memmove+0x34/0x50 [ 53.796730] seg6_do_srh_encap+0x284/0x620 [ 53.796741] ? seg6_do_srh+0x29b/0x360 [ 53.796747] seg6_do_srh+0x29b/0x360 [ 53.796756] seg6_input+0x2e/0x2e0 [ 53.796765] lwtunnel_input+0x93/0xd0 [ 53.796774] ipv6_rcv+0x690/0x920 [ 53.796783] ? ip6_input+0x170/0x170 [ 53.796791] ? eth_gro_receive+0x2d0/0x2d0 [ 53.796800] ? ip6_input+0x170/0x170 [ 53.796809] __netif_receive_skb_core+0xcc0/0x13f0 [ 53.796820] ? netdev_info+0x110/0x110 [ 53.796827] ? napi_complete_done+0xb6/0x170 [ 53.796834] ? e1000_clean+0x6da/0xf70 [ 53.796845] ? process_backlog+0x129/0x2a0 [ 53.796853] process_backlog+0x129/0x2a0 [ 53.796862] net_rx_action+0x211/0x5c0 [ 53.796870] ? napi_complete_done+0x170/0x170 [ 53.796887] ? run_rebalance_domains+0x11f/0x150 [ 53.796891] __do_softirq+0x10e/0x39e [ 53.796894] do_softirq_own_stack+0x2a/0x40 [ 53.796895] </IRQ> [ 53.796898] do_softirq.part.16+0x54/0x60 [ 53.796900] __local_bh_enable_ip+0x5b/0x60 [ 53.796903] ip6_finish_output2+0x416/0x9f0 [ 53.796906] ? ip6_dst_lookup_flow+0x110/0x110 [ 53.796909] ? ip6_sk_dst_lookup_flow+0x390/0x390 [ 53.796911] ? __rcu_read_unlock+0x66/0x80 [ 53.796913] ? ip6_mtu+0x44/0xf0 [ 53.796916] ? ip6_output+0xfc/0x220 [ 53.796918] ip6_output+0xfc/0x220 [ 53.796921] ? ip6_finish_output+0x2b0/0x2b0 [ 53.796923] ? memcpy+0x34/0x50 [ 53.796926] ip6_send_skb+0x43/0xc0 [ 53.796929] rawv6_sendmsg+0x1216/0x1530 [ 53.796932] ? __orc_find+0x6b/0xc0 [ 53.796934] ? rawv6_rcv_skb+0x160/0x160 [ 53.796937] ? __rcu_read_unlock+0x66/0x80 [ 53.796939] ? __rcu_read_unlock+0x66/0x80 [ 53.796942] ? is_bpf_text_address+0x1e/0x30 [ 53.796944] ? kernel_text_address+0xec/0x100 [ 53.796946] ? __kernel_text_address+0xe/0x30 [ 53.796948] ? unwind_get_return_address+0x2f/0x50 [ 53.796950] ? __save_stack_trace+0x92/0x100 [ 53.796954] ? save_stack+0x89/0xb0 [ 53.796956] ? kasan_kmalloc+0xa0/0xd0 [ 53.796958] ? kmem_cache_alloc+0xd2/0x1f0 [ 53.796961] ? prepare_creds+0x23/0x160 [ 53.796963] ? __x64_sys_capset+0x252/0x3e0 [ 53.796966] ? do_syscall_64+0x69/0x160 [ 53.796968] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 53.796971] ? __alloc_pages_nodemask+0x170/0x380 [ 53.796973] ? __alloc_pages_slowpath+0x12c0/0x12c0 [ 53.796977] ? tty_vhangup+0x20/0x20 [ 53.796979] ? policy_nodemask+0x1a/0x90 [ 53.796982] ? __mod_node_page_state+0x8d/0xa0 [ 53.796986] ? __check_object_size+0xe7/0x240 [ 53.796989] ? __sys_sendto+0x229/0x290 [ 53.796991] ? rawv6_rcv_skb+0x160/0x160 [ 53.796993] __sys_sendto+0x229/0x290 [ 53.796996] ? __ia32_sys_getpeername+0x50/0x50 [ 53.796999] ? commit_creds+0x2de/0x520 [ 53.797002] ? security_capset+0x57/0x70 [ 53.797004] ? __x64_sys_capset+0x29f/0x3e0 [ 53.797007] ? __x64_sys_rt_sigsuspend+0xe0/0xe0 [ 53.797011] ? __do_page_fault+0x664/0x770 [ 53.797014] __x64_sys_sendto+0x74/0x90 [ 53.797017] do_syscall_64+0x69/0x160 [ 53.797019] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 53.797022] RIP: 0033:0x7f43b7a6714a [ 53.797023] RSP: 002b:00007ffd891bd368 EFLAGS: 00000246 ORIG_RAX: 000000000000002c [ 53.797026] RAX: ffffffffffffffda RBX: 00000000006129c0 RCX: 00007f43b7a6714a [ 53.797028] RDX: 0000000000000040 RSI: 00000000006129c0 RDI: 0000000000000004 [ 53.797029] RBP: 00007ffd891be640 R08: 0000000000610940 R09: 000000000000001c [ 53.797030] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040 [ 53.797032] R13: 000000000060e6a0 R14: 0000000000008004 R15: 000000000040b661 [ 53.797171] Allocated by task 642: [ 53.797460] kasan_kmalloc+0xa0/0xd0 [ 53.797463] kmem_cache_alloc+0xd2/0x1f0 [ 53.797465] getname_flags+0x40/0x210 [ 53.797467] user_path_at_empty+0x1d/0x40 [ 53.797469] do_faccessat+0x12a/0x320 [ 53.797471] do_syscall_64+0x69/0x160 [ 53.797473] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 53.797607] Freed by task 642: [ 53.797869] __kasan_slab_free+0x130/0x180 [ 53.797871] kmem_cache_free+0xa8/0x230 [ 53.797872] filename_lookup+0x15b/0x230 [ 53.797874] do_faccessat+0x12a/0x320 [ 53.797876] do_syscall_64+0x69/0x160 [ 53.797878] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 53.798014] The buggy address belongs to the object at ffff88011975e600 which belongs to the cache names_cache of size 4096 [ 53.799043] The buggy address is located 1786 bytes inside of 4096-byte region [ffff88011975e600, ffff88011975f600) [ 53.800013] The buggy address belongs to the page: [ 53.800414] page:ffffea000465d600 count:1 mapcount:0 mapping:0000000000000000 index:0x0 compound_mapcount: 0 [ 53.801259] flags: 0x17fff0000008100(slab|head) [ 53.801640] raw: 017fff0000008100 0000000000000000 0000000000000000 0000000100070007 [ 53.803147] raw: dead000000000100 dead000000000200 ffff88011b185a40 0000000000000000 [ 53.803787] page dumped because: kasan: bad access detected [ 53.804384] Memory state around the buggy address: [ 53.804788] ffff88011975eb80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 53.805384] ffff88011975ec00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 53.805979] >ffff88011975ec80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 53.806577] ^ [ 53.807165] ffff88011975ed00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 53.807762] ffff88011975ed80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 53.808356] ================================================================== [ 53.808949] Disabling lock debugging due to kernel taint Fixes: 6c8702c60b88 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels") Signed-off-by: David Lebrun <dlebrun@google.com> Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-20net: ip6_gre: fix tunnel metadata device sharing.William Tu1-22/+79
Currently ip6gre and ip6erspan share single metadata mode device, using 'collect_md_tun'. Thus, when doing: ip link add dev ip6gre11 type ip6gretap external ip link add dev ip6erspan12 type ip6erspan external RTNETLINK answers: File exists simply fails due to the 2nd tries to create the same collect_md_tun. The patch fixes it by adding a separate collect md tunnel device for the ip6erspan, 'collect_md_tun_erspan'. As a result, a couple of places need to refactor/split up in order to distinguish ip6gre and ip6erspan. First, move the collect_md check at ip6gre_tunnel_{unlink,link} and create separate function {ip6gre,ip6ersapn}_tunnel_{link_md,unlink_md}. Then before link/unlink, make sure the link_md/unlink_md is called. Finally, a separate ndo_uninit is created for ip6erspan. Tested it using the samples/bpf/test_tunnel_bpf.sh. Fixes: ef7baf5e083c ("ip6_gre: add ip6 erspan collect_md mode") Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-18net: test tailroom before appending to linear skbWillem de Bruijn1-1/+2
Device features may change during transmission. In particular with corking, a device may toggle scatter-gather in between allocating and writing to an skb. Do not unconditionally assume that !NETIF_F_SG at write time implies that the same held at alloc time and thus the skb has sufficient tailroom. This issue predates git history. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17net: ip6_gre: Fix ip6erspan hlen calculationPetr Machata1-9/+65
Even though ip6erspan_tap_init() sets up hlen and tun_hlen according to what ERSPAN needs, it goes ahead to call ip6gre_tnl_link_config() which overwrites these settings with GRE-specific ones. Similarly for changelink callbacks, which are handled by ip6gre_changelink() calls ip6gre_tnl_change() calls ip6gre_tnl_link_config() as well. The difference ends up being 12 vs. 20 bytes, and this is generally not a problem, because a 12-byte request likely ends up allocating more and the extra 8 bytes are thus available. However correct it is not. So replace the newlink and changelink callbacks with an ERSPAN-specific ones, reusing the newly-introduced _common() functions. Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support") Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17net: ip6_gre: Split up ip6gre_changelink()Petr Machata1-9/+24
Extract from ip6gre_changelink() a reusable function ip6gre_changelink_common(). This will allow introduction of ERSPAN-specific _changelink() function with not a lot of code duplication. Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support") Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17net: ip6_gre: Split up ip6gre_newlink()Petr Machata1-6/+18
Extract from ip6gre_newlink() a reusable function ip6gre_newlink_common(). The ip6gre_tnl_link_config() call needs to be made customizable for ERSPAN, thus reorder it with calls to ip6_tnl_change_mtu() and dev_hold(), and extract the whole tail to the caller, ip6gre_newlink(). Thus enable an ERSPAN-specific _newlink() function without a lot of duplicity. Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support") Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17net: ip6_gre: Split up ip6gre_tnl_change()Petr Machata1-2/+8
Split a reusable function ip6gre_tnl_copy_tnl_parm() from ip6gre_tnl_change(). This will allow ERSPAN-specific code to reuse the common parts while customizing the behavior for ERSPAN. Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support") Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17net: ip6_gre: Split up ip6gre_tnl_link_config()Petr Machata1-12/+26
The function ip6gre_tnl_link_config() is used for setting up configuration of both ip6gretap and ip6erspan tunnels. Split the function into the common part and the route-lookup part. The latter then takes the calculated header length as an argument. This split will allow the patches down the line to sneak in a custom header length computation for the ERSPAN tunnel. Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support") Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17net: ip6_gre: Fix headroom request in ip6erspan_tunnel_xmit()Petr Machata1-1/+1
dev->needed_headroom is not primed until ip6_tnl_xmit(), so it starts out zero. Thus the call to skb_cow_head() fails to actually make sure there's enough headroom to push the ERSPAN headers to. That can lead to the panic cited below. (Reproducer below that). Fix by requesting either needed_headroom if already primed, or just the bare minimum needed for the header otherwise. [ 190.703567] kernel BUG at net/core/skbuff.c:104! [ 190.708384] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI [ 190.714007] Modules linked in: act_mirred cls_matchall ip6_gre ip6_tunnel tunnel6 gre sch_ingress vrf veth x86_pkg_temp_thermal mlx_platform nfsd e1000e leds_mlxcpld [ 190.728975] CPU: 1 PID: 959 Comm: kworker/1:2 Not tainted 4.17.0-rc4-net_master-custom-139 #10 [ 190.737647] Hardware name: Mellanox Technologies Ltd. "MSN2410-CB2F"/"SA000874", BIOS 4.6.5 03/08/2016 [ 190.747006] Workqueue: ipv6_addrconf addrconf_dad_work [ 190.752222] RIP: 0010:skb_panic+0xc3/0x100 [ 190.756358] RSP: 0018:ffff8801d54072f0 EFLAGS: 00010282 [ 190.761629] RAX: 0000000000000085 RBX: ffff8801c1a8ecc0 RCX: 0000000000000000 [ 190.768830] RDX: 0000000000000085 RSI: dffffc0000000000 RDI: ffffed003aa80e54 [ 190.776025] RBP: ffff8801bd1ec5a0 R08: ffffed003aabce19 R09: ffffed003aabce19 [ 190.783226] R10: 0000000000000001 R11: ffffed003aabce18 R12: ffff8801bf695dbe [ 190.790418] R13: 0000000000000084 R14: 00000000000006c0 R15: ffff8801bf695dc8 [ 190.797621] FS: 0000000000000000(0000) GS:ffff8801d5400000(0000) knlGS:0000000000000000 [ 190.805786] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 190.811582] CR2: 000055fa929aced0 CR3: 0000000003228004 CR4: 00000000001606e0 [ 190.818790] Call Trace: [ 190.821264] <IRQ> [ 190.823314] ? ip6erspan_tunnel_xmit+0x5e4/0x1982 [ip6_gre] [ 190.828940] ? ip6erspan_tunnel_xmit+0x5e4/0x1982 [ip6_gre] [ 190.834562] skb_push+0x78/0x90 [ 190.837749] ip6erspan_tunnel_xmit+0x5e4/0x1982 [ip6_gre] [ 190.843219] ? ip6gre_tunnel_ioctl+0xd90/0xd90 [ip6_gre] [ 190.848577] ? debug_check_no_locks_freed+0x210/0x210 [ 190.853679] ? debug_check_no_locks_freed+0x210/0x210 [ 190.858783] ? print_irqtrace_events+0x120/0x120 [ 190.863451] ? sched_clock_cpu+0x18/0x210 [ 190.867496] ? cyc2ns_read_end+0x10/0x10 [ 190.871474] ? skb_network_protocol+0x76/0x200 [ 190.875977] dev_hard_start_xmit+0x137/0x770 [ 190.880317] ? do_raw_spin_trylock+0x6d/0xa0 [ 190.884624] sch_direct_xmit+0x2ef/0x5d0 [ 190.888589] ? pfifo_fast_dequeue+0x3fa/0x670 [ 190.892994] ? pfifo_fast_change_tx_queue_len+0x810/0x810 [ 190.898455] ? __lock_is_held+0xa0/0x160 [ 190.902422] __qdisc_run+0x39e/0xfc0 [ 190.906041] ? _raw_spin_unlock+0x29/0x40 [ 190.910090] ? pfifo_fast_enqueue+0x24b/0x3e0 [ 190.914501] ? sch_direct_xmit+0x5d0/0x5d0 [ 190.918658] ? pfifo_fast_dequeue+0x670/0x670 [ 190.923047] ? __dev_queue_xmit+0x172/0x1770 [ 190.927365] ? preempt_count_sub+0xf/0xd0 [ 190.931421] __dev_queue_xmit+0x410/0x1770 [ 190.935553] ? ___slab_alloc+0x605/0x930 [ 190.939524] ? print_irqtrace_events+0x120/0x120 [ 190.944186] ? memcpy+0x34/0x50 [ 190.947364] ? netdev_pick_tx+0x1c0/0x1c0 [ 190.951428] ? __skb_clone+0x2fd/0x3d0 [ 190.955218] ? __copy_skb_header+0x270/0x270 [ 190.959537] ? rcu_read_lock_sched_held+0x93/0xa0 [ 190.964282] ? kmem_cache_alloc+0x344/0x4d0 [ 190.968520] ? cyc2ns_read_end+0x10/0x10 [ 190.972495] ? skb_clone+0x123/0x230 [ 190.976112] ? skb_split+0x820/0x820 [ 190.979747] ? tcf_mirred+0x554/0x930 [act_mirred] [ 190.984582] tcf_mirred+0x554/0x930 [act_mirred] [ 190.989252] ? tcf_mirred_act_wants_ingress.part.2+0x10/0x10 [act_mirred] [ 190.996109] ? __lock_acquire+0x706/0x26e0 [ 191.000239] ? sched_clock_cpu+0x18/0x210 [ 191.004294] tcf_action_exec+0xcf/0x2a0 [ 191.008179] tcf_classify+0xfa/0x340 [ 191.011794] __netif_receive_skb_core+0x8e1/0x1c60 [ 191.016630] ? debug_check_no_locks_freed+0x210/0x210 [ 191.021732] ? nf_ingress+0x500/0x500 [ 191.025458] ? process_backlog+0x347/0x4b0 [ 191.029619] ? print_irqtrace_events+0x120/0x120 [ 191.034302] ? lock_acquire+0xd8/0x320 [ 191.038089] ? process_backlog+0x1b6/0x4b0 [ 191.042246] ? process_backlog+0xc2/0x4b0 [ 191.046303] process_backlog+0xc2/0x4b0 [ 191.050189] net_rx_action+0x5cc/0x980 [ 191.053991] ? napi_complete_done+0x2c0/0x2c0 [ 191.058386] ? mark_lock+0x13d/0xb40 [ 191.062001] ? clockevents_program_event+0x6b/0x1d0 [ 191.066922] ? print_irqtrace_events+0x120/0x120 [ 191.071593] ? __lock_is_held+0xa0/0x160 [ 191.075566] __do_softirq+0x1d4/0x9d2 [ 191.079282] ? ip6_finish_output2+0x524/0x1460 [ 191.083771] do_softirq_own_stack+0x2a/0x40 [ 191.087994] </IRQ> [ 191.090130] do_softirq.part.13+0x38/0x40 [ 191.094178] __local_bh_enable_ip+0x135/0x190 [ 191.098591] ip6_finish_output2+0x54d/0x1460 [ 191.102916] ? ip6_forward_finish+0x2f0/0x2f0 [ 191.107314] ? ip6_mtu+0x3c/0x2c0 [ 191.110674] ? ip6_finish_output+0x2f8/0x650 [ 191.114992] ? ip6_output+0x12a/0x500 [ 191.118696] ip6_output+0x12a/0x500 [ 191.122223] ? ip6_route_dev_notify+0x5b0/0x5b0 [ 191.126807] ? ip6_finish_output+0x650/0x650 [ 191.131120] ? ip6_fragment+0x1a60/0x1a60 [ 191.135182] ? icmp6_dst_alloc+0x26e/0x470 [ 191.139317] mld_sendpack+0x672/0x830 [ 191.143021] ? igmp6_mcf_seq_next+0x2f0/0x2f0 [ 191.147429] ? __local_bh_enable_ip+0x77/0x190 [ 191.151913] ipv6_mc_dad_complete+0x47/0x90 [ 191.156144] addrconf_dad_completed+0x561/0x720 [ 191.160731] ? addrconf_rs_timer+0x3a0/0x3a0 [ 191.165036] ? mark_held_locks+0xc9/0x140 [ 191.169095] ? __local_bh_enable_ip+0x77/0x190 [ 191.173570] ? addrconf_dad_work+0x50d/0xa20 [ 191.177886] ? addrconf_dad_work+0x529/0xa20 [ 191.182194] addrconf_dad_work+0x529/0xa20 [ 191.186342] ? addrconf_dad_completed+0x720/0x720 [ 191.191088] ? __lock_is_held+0xa0/0x160 [ 191.195059] ? process_one_work+0x45d/0xe20 [ 191.199302] ? process_one_work+0x51e/0xe20 [ 191.203531] ? rcu_read_lock_sched_held+0x93/0xa0 [ 191.208279] process_one_work+0x51e/0xe20 [ 191.212340] ? pwq_dec_nr_in_flight+0x200/0x200 [ 191.216912] ? get_lock_stats+0x4b/0xf0 [ 191.220788] ? preempt_count_sub+0xf/0xd0 [ 191.224844] ? worker_thread+0x219/0x860 [ 191.228823] ? do_raw_spin_trylock+0x6d/0xa0 [ 191.233142] worker_thread+0xeb/0x860 [ 191.236848] ? process_one_work+0xe20/0xe20 [ 191.241095] kthread+0x206/0x300 [ 191.244352] ? process_one_work+0xe20/0xe20 [ 191.248587] ? kthread_stop+0x570/0x570 [ 191.252459] ret_from_fork+0x3a/0x50 [ 191.256082] Code: 14 3e ff 8b 4b 78 55 4d 89 f9 41 56 41 55 48 c7 c7 a0 cf db 82 41 54 44 8b 44 24 2c 48 8b 54 24 30 48 8b 74 24 20 e8 16 94 13 ff <0f> 0b 48 c7 c7 60 8e 1f 85 48 83 c4 20 e8 55 ef a6 ff 89 74 24 [ 191.275327] RIP: skb_panic+0xc3/0x100 RSP: ffff8801d54072f0 [ 191.281024] ---[ end trace 7ea51094e099e006 ]--- [ 191.285724] Kernel panic - not syncing: Fatal exception in interrupt [ 191.292168] Kernel Offset: disabled [ 191.295697] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- Reproducer: ip link add h1 type veth peer name swp1 ip link add h3 type veth peer name swp3 ip link set dev h1 up ip address add 192.0.2.1/28 dev h1 ip link add dev vh3 type vrf table 20 ip link set dev h3 master vh3 ip link set dev vh3 up ip link set dev h3 up ip link set dev swp3 up ip address add dev swp3 2001:db8:2::1/64 ip link set dev swp1 up tc qdisc add dev swp1 clsact ip link add name gt6 type ip6erspan \ local 2001:db8:2::1 remote 2001:db8:2::2 oseq okey 123 ip link set dev gt6 up sleep 1 tc filter add dev swp1 ingress pref 1000 matchall skip_hw \ action mirred egress mirror dev gt6 ping -I h1 192.0.2.2 Fixes: e41c7c68ea77 ("ip6erspan: make sure enough headroom at xmit.") Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17net: ip6_gre: Request headroom in __gre6_xmit()Petr Machata1-0/+3
__gre6_xmit() pushes GRE headers before handing over to ip6_tnl_xmit() for generic IP-in-IP processing. However it doesn't make sure that there is enough headroom to push the header to. That can lead to the panic cited below. (Reproducer below that). Fix by requesting either needed_headroom if already primed, or just the bare minimum needed for the header otherwise. [ 158.576725] kernel BUG at net/core/skbuff.c:104! [ 158.581510] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI [ 158.587174] Modules linked in: act_mirred cls_matchall ip6_gre ip6_tunnel tunnel6 gre sch_ingress vrf veth x86_pkg_temp_thermal mlx_platform nfsd e1000e leds_mlxcpld [ 158.602268] CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 4.17.0-rc4-net_master-custom-139 #10 [ 158.610938] Hardware name: Mellanox Technologies Ltd. "MSN2410-CB2F"/"SA000874", BIOS 4.6.5 03/08/2016 [ 158.620426] RIP: 0010:skb_panic+0xc3/0x100 [ 158.624586] RSP: 0018:ffff8801d3f27110 EFLAGS: 00010286 [ 158.629882] RAX: 0000000000000082 RBX: ffff8801c02cc040 RCX: 0000000000000000 [ 158.637127] RDX: 0000000000000082 RSI: dffffc0000000000 RDI: ffffed003a7e4e18 [ 158.644366] RBP: ffff8801bfec8020 R08: ffffed003aabce19 R09: ffffed003aabce19 [ 158.651574] R10: 000000000000000b R11: ffffed003aabce18 R12: ffff8801c364de66 [ 158.658786] R13: 000000000000002c R14: 00000000000000c0 R15: ffff8801c364de68 [ 158.666007] FS: 0000000000000000(0000) GS:ffff8801d5400000(0000) knlGS:0000000000000000 [ 158.674212] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 158.680036] CR2: 00007f4b3702dcd0 CR3: 0000000003228002 CR4: 00000000001606e0 [ 158.687228] Call Trace: [ 158.689752] ? __gre6_xmit+0x246/0xd80 [ip6_gre] [ 158.694475] ? __gre6_xmit+0x246/0xd80 [ip6_gre] [ 158.699141] skb_push+0x78/0x90 [ 158.702344] __gre6_xmit+0x246/0xd80 [ip6_gre] [ 158.706872] ip6gre_tunnel_xmit+0x3bc/0x610 [ip6_gre] [ 158.711992] ? __gre6_xmit+0xd80/0xd80 [ip6_gre] [ 158.716668] ? debug_check_no_locks_freed+0x210/0x210 [ 158.721761] ? print_irqtrace_events+0x120/0x120 [ 158.726461] ? sched_clock_cpu+0x18/0x210 [ 158.730572] ? sched_clock_cpu+0x18/0x210 [ 158.734692] ? cyc2ns_read_end+0x10/0x10 [ 158.738705] ? skb_network_protocol+0x76/0x200 [ 158.743216] ? netif_skb_features+0x1b2/0x550 [ 158.747648] dev_hard_start_xmit+0x137/0x770 [ 158.752010] sch_direct_xmit+0x2ef/0x5d0 [ 158.755992] ? pfifo_fast_dequeue+0x3fa/0x670 [ 158.760460] ? pfifo_fast_change_tx_queue_len+0x810/0x810 [ 158.765975] ? __lock_is_held+0xa0/0x160 [ 158.770002] __qdisc_run+0x39e/0xfc0 [ 158.773673] ? _raw_spin_unlock+0x29/0x40 [ 158.777781] ? pfifo_fast_enqueue+0x24b/0x3e0 [ 158.782191] ? sch_direct_xmit+0x5d0/0x5d0 [ 158.786372] ? pfifo_fast_dequeue+0x670/0x670 [ 158.790818] ? __dev_queue_xmit+0x172/0x1770 [ 158.795195] ? preempt_count_sub+0xf/0xd0 [ 158.799313] __dev_queue_xmit+0x410/0x1770 [ 158.803512] ? ___slab_alloc+0x605/0x930 [ 158.807525] ? ___slab_alloc+0x605/0x930 [ 158.811540] ? memcpy+0x34/0x50 [ 158.814768] ? netdev_pick_tx+0x1c0/0x1c0 [ 158.818895] ? __skb_clone+0x2fd/0x3d0 [ 158.822712] ? __copy_skb_header+0x270/0x270 [ 158.827079] ? rcu_read_lock_sched_held+0x93/0xa0 [ 158.831903] ? kmem_cache_alloc+0x344/0x4d0 [ 158.836199] ? skb_clone+0x123/0x230 [ 158.839869] ? skb_split+0x820/0x820 [ 158.843521] ? tcf_mirred+0x554/0x930 [act_mirred] [ 158.848407] tcf_mirred+0x554/0x930 [act_mirred] [ 158.853104] ? tcf_mirred_act_wants_ingress.part.2+0x10/0x10 [act_mirred] [ 158.860005] ? __lock_acquire+0x706/0x26e0 [ 158.864162] ? mark_lock+0x13d/0xb40 [ 158.867832] tcf_action_exec+0xcf/0x2a0 [ 158.871736] tcf_classify+0xfa/0x340 [ 158.875402] __netif_receive_skb_core+0x8e1/0x1c60 [ 158.880334] ? nf_ingress+0x500/0x500 [ 158.884059] ? process_backlog+0x347/0x4b0 [ 158.888241] ? lock_acquire+0xd8/0x320 [ 158.892050] ? process_backlog+0x1b6/0x4b0 [ 158.896228] ? process_backlog+0xc2/0x4b0 [ 158.900291] process_backlog+0xc2/0x4b0 [ 158.904210] net_rx_action+0x5cc/0x980 [ 158.908047] ? napi_complete_done+0x2c0/0x2c0 [ 158.912525] ? rcu_read_unlock+0x80/0x80 [ 158.916534] ? __lock_is_held+0x34/0x160 [ 158.920541] __do_softirq+0x1d4/0x9d2 [ 158.924308] ? trace_event_raw_event_irq_handler_exit+0x140/0x140 [ 158.930515] run_ksoftirqd+0x1d/0x40 [ 158.934152] smpboot_thread_fn+0x32b/0x690 [ 158.938299] ? sort_range+0x20/0x20 [ 158.941842] ? preempt_count_sub+0xf/0xd0 [ 158.945940] ? schedule+0x5b/0x140 [ 158.949412] kthread+0x206/0x300 [ 158.952689] ? sort_range+0x20/0x20 [ 158.956249] ? kthread_stop+0x570/0x570 [ 158.960164] ret_from_fork+0x3a/0x50 [ 158.963823] Code: 14 3e ff 8b 4b 78 55 4d 89 f9 41 56 41 55 48 c7 c7 a0 cf db 82 41 54 44 8b 44 24 2c 48 8b 54 24 30 48 8b 74 24 20 e8 16 94 13 ff <0f> 0b 48 c7 c7 60 8e 1f 85 48 83 c4 20 e8 55 ef a6 ff 89 74 24 [ 158.983235] RIP: skb_panic+0xc3/0x100 RSP: ffff8801d3f27110 [ 158.988935] ---[ end trace 5af56ee845aa6cc8 ]--- [ 158.993641] Kernel panic - not syncing: Fatal exception in interrupt [ 159.000176] Kernel Offset: disabled [ 159.003767] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- Reproducer: ip link add h1 type veth peer name swp1 ip link add h3 type veth peer name swp3 ip link set dev h1 up ip address add 192.0.2.1/28 dev h1 ip link add dev vh3 type vrf table 20 ip link set dev h3 master vh3 ip link set dev vh3 up ip link set dev h3 up ip link set dev swp3 up ip address add dev swp3 2001:db8:2::1/64 ip link set dev swp1 up tc qdisc add dev swp1 clsact ip link add name gt6 type ip6gretap \ local 2001:db8:2::1 remote 2001:db8:2::2 ip link set dev gt6 up sleep 1 tc filter add dev swp1 ingress pref 1000 matchall skip_hw \ action mirred egress mirror dev gt6 ping -I h1 192.0.2.2 Fixes: c12b395a4664 ("gre: Support GRE over IPv6") Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17erspan: fix invalid erspan version.William Tu1-1/+4
ERSPAN only support version 1 and 2. When packets send to an erspan device which does not have proper version number set, drop the packet. In real case, we observe multicast packets sent to the erspan pernet device, erspan0, which does not have erspan version configured. Reported-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-14xfrm6: avoid potential infinite loop in _decode_session6()Eric Dumazet1-1/+1
syzbot found a way to trigger an infinitie loop by overflowing @offset variable that has been forced to use u16 for some very obscure reason in the past. We probably want to look at NEXTHDR_FRAGMENT handling which looks wrong, in a separate patch. In net-next, we shall try to use skb_header_pointer() instead of pskb_may_pull(). watchdog: BUG: soft lockup - CPU#1 stuck for 134s! [syz-executor738:4553] Modules linked in: irq event stamp: 13885653 hardirqs last enabled at (13885652): [<ffffffff878009d5>] restore_regs_and_return_to_kernel+0x0/0x2b hardirqs last disabled at (13885653): [<ffffffff87800905>] interrupt_entry+0xb5/0xf0 arch/x86/entry/entry_64.S:625 softirqs last enabled at (13614028): [<ffffffff84df0809>] tun_napi_alloc_frags drivers/net/tun.c:1478 [inline] softirqs last enabled at (13614028): [<ffffffff84df0809>] tun_get_user+0x1dd9/0x4290 drivers/net/tun.c:1825 softirqs last disabled at (13614032): [<ffffffff84df1b6f>] tun_get_user+0x313f/0x4290 drivers/net/tun.c:1942 CPU: 1 PID: 4553 Comm: syz-executor738 Not tainted 4.17.0-rc3+ #40 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:check_kcov_mode kernel/kcov.c:67 [inline] RIP: 0010:__sanitizer_cov_trace_pc+0x20/0x50 kernel/kcov.c:101 RSP: 0018:ffff8801d8cfe250 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 RAX: ffff8801d88a8080 RBX: ffff8801d7389e40 RCX: 0000000000000006 RDX: 0000000000000000 RSI: ffffffff868da4ad RDI: ffff8801c8a53277 RBP: ffff8801d8cfe250 R08: ffff8801d88a8080 R09: ffff8801d8cfe3e8 R10: ffffed003b19fc87 R11: ffff8801d8cfe43f R12: ffff8801c8a5327f R13: 0000000000000000 R14: ffff8801c8a4e5fe R15: ffff8801d8cfe3e8 FS: 0000000000d88940(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffff600400 CR3: 00000001acab3000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: _decode_session6+0xc1d/0x14f0 net/ipv6/xfrm6_policy.c:150 __xfrm_decode_session+0x71/0x140 net/xfrm/xfrm_policy.c:2368 xfrm_decode_session_reverse include/net/xfrm.h:1213 [inline] icmpv6_route_lookup+0x395/0x6e0 net/ipv6/icmp.c:372 icmp6_send+0x1982/0x2da0 net/ipv6/icmp.c:551 icmpv6_send+0x17a/0x300 net/ipv6/ip6_icmp.c:43 ip6_input_finish+0x14e1/0x1a30 net/ipv6/ip6_input.c:305 NF_HOOK include/linux/netfilter.h:288 [inline] ip6_input+0xe1/0x5e0 net/ipv6/ip6_input.c:327 dst_input include/net/dst.h:450 [inline] ip6_rcv_finish+0x29c/0xa10 net/ipv6/ip6_input.c:71 NF_HOOK include/linux/netfilter.h:288 [inline] ipv6_rcv+0xeb8/0x2040 net/ipv6/ip6_input.c:208 __netif_receive_skb_core+0x2468/0x3650 net/core/dev.c:4646 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4711 netif_receive_skb_internal+0x126/0x7b0 net/core/dev.c:4785 napi_frags_finish net/core/dev.c:5226 [inline] napi_gro_frags+0x631/0xc40 net/core/dev.c:5299 tun_get_user+0x3168/0x4290 drivers/net/tun.c:1951 tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:1996 call_write_iter include/linux/fs.h:1784 [inline] do_iter_readv_writev+0x859/0xa50 fs/read_write.c:680 do_iter_write+0x185/0x5f0 fs/read_write.c:959 vfs_writev+0x1c7/0x330 fs/read_write.c:1004 do_writev+0x112/0x2f0 fs/read_write.c:1039 __do_sys_writev fs/read_write.c:1112 [inline] __se_sys_writev fs/read_write.c:1109 [inline] __x64_sys_writev+0x75/0xb0 fs/read_write.c:1109 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reported-by: syzbot+0053c8...@syzkaller.appspotmail.com Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-05-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller1-0/+1
Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains Netfilter/IPVS fixes for your net tree, they are: 1) Fix handling of simultaneous open TCP connection in conntrack, from Jozsef Kadlecsik. 2) Insufficient sanitify check of xtables extension names, from Florian Westphal. 3) Skip unnecessary synchronize_rcu() call when transaction log is already empty, from Florian Westphal. 4) Incorrect destination mac validation in ebt_stp, from Stephen Hemminger. 5) xtables module reference counter leak in nft_compat, from Florian Westphal. 6) Incorrect connection reference counting logic in IPVS one-packet scheduler, from Julian Anastasov. 7) Wrong stats for 32-bits CPU in IPVS, also from Julian. 8) Calm down sparse error in netfilter core, also from Florian. 9) Use nla_strlcpy to fix compilation warning in nfnetlink_acct and nfnetlink_cthelper, again from Florian. 10) Missing module alias in icmp and icmp6 xtables extensions, from Florian Westphal. 11) Base chain statistics in nf_tables may be unset/null, from Florian. 12) Fix handling of large matchinfo size in nft_compat, this includes one preparation for before this fix. From Florian. 13) Fix bogus EBUSY error when deleting chains due to incorrect reference counting from the preparation phase of the two-phase commit protocol. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-10udp: fix SO_BINDTODEVICEPaolo Abeni1-2/+2
Damir reported a breakage of SO_BINDTODEVICE for UDP sockets. In absence of VRF devices, after commit fb74c27735f0 ("net: ipv4: add second dif to udp socket lookups") the dif mismatch isn't fatal anymore for UDP socket lookup with non null sk_bound_dev_if, breaking SO_BINDTODEVICE semantics. This changeset addresses the issue making the dif match mandatory again in the above scenario. Reported-by: Damir Mansurov <dnman@oktetlabs.ru> Fixes: fb74c27735f0 ("net: ipv4: add second dif to udp socket lookups") Fixes: 1801b570dd2a ("net: ipv6: add second dif to udp socket lookups") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-08netfilter: x_tables: add module alias for icmp matchesFlorian Westphal1-0/+1
The icmp matches are implemented in ip_tables and ip6_tables, respectively, so for normal iptables they are always available: those modules are loaded once iptables calls getsockopt() to fetch available module revisions. In iptables-over-nftables case probing occurs via nfnetlink, so these modules might not be loaded. Add aliases so modprobe can load these when icmp/icmp6 is requested. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-08trivial: fix inconsistent help textsGeorg Hofmann1-5/+4
This patch removes "experimental" from the help text where depends on CONFIG_EXPERIMENTAL was already removed. Signed-off-by: Georg Hofmann <georg@hofmannsweb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-08Merge branch 'master' of ↵David S. Miller2-2/+5
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2018-05-07 1) Always verify length of provided sadb_key to fix a slab-out-of-bounds read in pfkey_add. From Kevin Easton. 2) Make sure that all states are really deleted before we check that the state lists are empty. Otherwise we trigger a warning. 3) Fix MTU handling of the VTI6 interfaces on interfamily tunnels. From Stefano Brivio. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-02ipv6: Revert "ipv6: Allow non-gateway ECMP for IPv6"Ido Schimmel1-0/+3
This reverts commit edd7ceb78296 ("ipv6: Allow non-gateway ECMP for IPv6"). Eric reported a division by zero in rt6_multipath_rebalance() which is caused by above commit that considers identical local routes to be siblings. The division by zero happens because a nexthop weight is not set for local routes. Revert the commit as it does not fix a bug and has side effects. To reproduce: # ip -6 address add 2001:db8::1/64 dev dummy0 # ip -6 address add 2001:db8::1/64 dev dummy1 Fixes: edd7ceb78296 ("ipv6: Allow non-gateway ECMP for IPv6") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-01ipv6: Allow non-gateway ECMP for IPv6Thomas Winter1-3/+0
It is valid to have static routes where the nexthop is an interface not an address such as tunnels. For IPv4 it was possible to use ECMP on these routes but not for IPv6. Signed-off-by: Thomas Winter <Thomas.Winter@alliedtelesis.co.nz> Cc: David Ahern <dsahern@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-01ipv6: fix uninit-value in ip6_multipath_l3_keys()Eric Dumazet1-1/+6
syzbot/KMSAN reported an uninit-value in ip6_multipath_l3_keys(), root caused to a bad assumption of ICMP header being already pulled in skb->head ip_multipath_l3_keys() does the correct thing, so it is an IPv6 only bug. BUG: KMSAN: uninit-value in ip6_multipath_l3_keys net/ipv6/route.c:1830 [inline] BUG: KMSAN: uninit-value in rt6_multipath_hash+0x5c4/0x640 net/ipv6/route.c:1858 CPU: 0 PID: 4507 Comm: syz-executor661 Not tainted 4.16.0+ #87 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x185/0x1d0 lib/dump_stack.c:53 kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067 __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:683 ip6_multipath_l3_keys net/ipv6/route.c:1830 [inline] rt6_multipath_hash+0x5c4/0x640 net/ipv6/route.c:1858 ip6_route_input+0x65a/0x920 net/ipv6/route.c:1884 ip6_rcv_finish+0x413/0x6e0 net/ipv6/ip6_input.c:69 NF_HOOK include/linux/netfilter.h:288 [inline] ipv6_rcv+0x1e16/0x2340 net/ipv6/ip6_input.c:208 __netif_receive_skb_core+0x47df/0x4a90 net/core/dev.c:4562 __netif_receive_skb net/core/dev.c:4627 [inline] netif_receive_skb_internal+0x49d/0x630 net/core/dev.c:4701 netif_receive_skb+0x230/0x240 net/core/dev.c:4725 tun_rx_batched drivers/net/tun.c:1555 [inline] tun_get_user+0x740f/0x7c60 drivers/net/tun.c:1962 tun_chr_write_iter+0x1d4/0x330 drivers/net/tun.c:1990 call_write_iter include/linux/fs.h:1782 [inline] new_sync_write fs/read_write.c:469 [inline] __vfs_write+0x7fb/0x9f0 fs/read_write.c:482 vfs_write+0x463/0x8d0 fs/read_write.c:544 SYSC_write+0x172/0x360 fs/read_write.c:589 SyS_write+0x55/0x80 fs/read_write.c:581 do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 Fixes: 23aebdacb05d ("ipv6: Compute multipath hash for ICMP errors from offending packet") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Jakub Sitnicki <jkbs@redhat.com> Acked-by: Jakub Sitnicki <jkbs@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27vti6: Change minimum MTU to IPV4_MIN_MTU, vti6 can carry IPv4 tooStefano Brivio1-2/+2
A vti6 interface can carry IPv4 as well, so it makes no sense to enforce a minimum MTU of IPV6_MIN_MTU. If the user sets an MTU below IPV6_MIN_MTU, IPv6 will be disabled on the interface, courtesy of addrconf_notify(). Reported-by: Xin Long <lucien.xin@gmail.com> Fixes: b96f9afee4eb ("ipv4/6: use core net MTU range checking") Fixes: c6741fbed6dc ("vti6: Properly adjust vti6 MTU from MTU of lower device") Fixes: 53c81e95df17 ("ip6_vti: adjust vti mtu according to mtu of lower device") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-04-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller1-27/+28
Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains Netfilter/IPVS fixes for your net tree, they are: 1) Fix SIP conntrack with phones sending session descriptions for different media types but same port numbers, from Florian Westphal. 2) Fix incorrect rtnl_lock mutex logic from IPVS sync thread, from Julian Anastasov. 3) Skip compat array allocation in ebtables if there is no entries, also from Florian. 4) Do not lose left/right bits when shifting marks from xt_connmark, from Jack Ma. 5) Silence false positive memleak in conntrack extensions, from Cong Wang. 6) Fix CONFIG_NF_REJECT_IPV6=m link problems, from Arnd Bergmann. 7) Cannot kfree rule that is already in list in nf_tables, switch order so this error handling is not required, from Florian Westphal. 8) Release set name in error path, from Florian. 9) include kmemleak.h in nf_conntrack_extend.c, from Stepheh Rothwell. 10) NAT chain and extensions depend on NF_TABLES. 11) Out of bound access when renaming chains, from Taehee Yoo. 12) Incorrect casting in xt_connmark leads to wrong bitshifting. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-23ipv6: add RTA_TABLE and RTA_PREFSRC to rtm_ipv6_policyEric Dumazet1-0/+2
KMSAN reported use of uninit-value that I tracked to lack of proper size check on RTA_TABLE attribute. I also believe RTA_PREFSRC lacks a similar check. Fixes: 86872cb57925 ("[IPv6] route: FIB6 configuration using struct fib6_config") Fixes: c3968a857a6b ("ipv6: RTA_PREFSRC support for ipv6 route source address selection") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-23ipv6: sr: fix NULL pointer dereference in seg6_do_srh_encap()- v4 pktsAhmed Abdelsalam1-1/+1
In case of seg6 in encap mode, seg6_do_srh_encap() calls set_tun_src() in order to set the src addr of outer IPv6 header. The net_device is required for set_tun_src(). However calling ip6_dst_idev() on dst_entry in case of IPv4 traffic results on the following bug. Using just dst->dev should fix this BUG. [ 196.242461] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 [ 196.242975] PGD 800000010f076067 P4D 800000010f076067 PUD 10f060067 PMD 0 [ 196.243329] Oops: 0000 [#1] SMP PTI [ 196.243468] Modules linked in: nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd input_leds glue_helper led_class pcspkr serio_raw mac_hid video autofs4 hid_generic usbhid hid e1000 i2c_piix4 ahci pata_acpi libahci [ 196.244362] CPU: 2 PID: 1089 Comm: ping Not tainted 4.16.0+ #1 [ 196.244606] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 196.244968] RIP: 0010:seg6_do_srh_encap+0x1ac/0x300 [ 196.245236] RSP: 0018:ffffb2ce00b23a60 EFLAGS: 00010202 [ 196.245464] RAX: 0000000000000000 RBX: ffff8c7f53eea300 RCX: 0000000000000000 [ 196.245742] RDX: 0000f10000000000 RSI: ffff8c7f52085a6c RDI: ffff8c7f41166850 [ 196.246018] RBP: ffffb2ce00b23aa8 R08: 00000000000261e0 R09: ffff8c7f41166800 [ 196.246294] R10: ffffdce5040ac780 R11: ffff8c7f41166828 R12: ffff8c7f41166808 [ 196.246570] R13: ffff8c7f52085a44 R14: ffffffffb73211c0 R15: ffff8c7e69e44200 [ 196.246846] FS: 00007fc448789700(0000) GS:ffff8c7f59d00000(0000) knlGS:0000000000000000 [ 196.247286] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 196.247526] CR2: 0000000000000000 CR3: 000000010f05a000 CR4: 00000000000406e0 [ 196.247804] Call Trace: [ 196.247972] seg6_do_srh+0x15b/0x1c0 [ 196.248156] seg6_output+0x3c/0x220 [ 196.248341] ? prandom_u32+0x14/0x20 [ 196.248526] ? ip_idents_reserve+0x6c/0x80 [ 196.248723] ? __ip_select_ident+0x90/0x100 [ 196.248923] ? ip_append_data.part.50+0x6c/0xd0 [ 196.249133] lwtunnel_output+0x44/0x70 [ 196.249328] ip_send_skb+0x15/0x40 [ 196.249515] raw_sendmsg+0x8c3/0xac0 [ 196.249701] ? _copy_from_user+0x2e/0x60 [ 196.249897] ? rw_copy_check_uvector+0x53/0x110 [ 196.250106] ? _copy_from_user+0x2e/0x60 [ 196.250299] ? copy_msghdr_from_user+0xce/0x140 [ 196.250508] sock_sendmsg+0x36/0x40 [ 196.250690] ___sys_sendmsg+0x292/0x2a0 [ 196.250881] ? _cond_resched+0x15/0x30 [ 196.251074] ? copy_termios+0x1e/0x70 [ 196.251261] ? _copy_to_user+0x22/0x30 [ 196.251575] ? tty_mode_ioctl+0x1c3/0x4e0 [ 196.251782] ? _cond_resched+0x15/0x30 [ 196.251972] ? mutex_lock+0xe/0x30 [ 196.252152] ? vvar_fault+0xd2/0x110 [ 196.252337] ? __do_fault+0x1f/0xc0 [ 196.252521] ? __handle_mm_fault+0xc1f/0x12d0 [ 196.252727] ? __sys_sendmsg+0x63/0xa0 [ 196.252919] __sys_sendmsg+0x63/0xa0 [ 196.253107] do_syscall_64+0x72/0x200 [ 196.253305] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [ 196.253530] RIP: 0033:0x7fc4480b0690 [ 196.253715] RSP: 002b:00007ffde9f252f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 196.254053] RAX: ffffffffffffffda RBX: 0000000000000040 RCX: 00007fc4480b0690 [ 196.254331] RDX: 0000000000000000 RSI: 000000000060a360 RDI: 0000000000000003 [ 196.254608] RBP: 00007ffde9f253f0 R08: 00000000002d1e81 R09: 0000000000000002 [ 196.254884] R10: 00007ffde9f250c0 R11: 0000000000000246 R12: 0000000000b22070 [ 196.255205] R13: 20c49ba5e353f7cf R14: 431bde82d7b634db R15: 00007ffde9f278fe [ 196.255484] Code: a5 0f b6 45 c0 41 88 41 28 41 0f b6 41 2c 48 c1 e0 04 49 8b 54 01 38 49 8b 44 01 30 49 89 51 20 49 89 41 18 48 8b 83 b0 00 00 00 <48> 8b 30 49 8b 86 08 0b 00 00 48 8b 40 20 48 8b 50 08 48 0b 10 [ 196.256190] RIP: seg6_do_srh_encap+0x1ac/0x300 RSP: ffffb2ce00b23a60 [ 196.256445] CR2: 0000000000000000 [ 196.256676] ---[ end trace 71af7d093603885c ]--- Fixes: 8936ef7604c11 ("ipv6: sr: fix NULL pointer dereference when setting encap source address") Signed-off-by: Ahmed Abdelsalam <amsalam20@gmail.com> Acked-by: David Lebrun <dlebrun@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19netfilter: nf_tables: NAT chain and extensions require NF_TABLESPablo Neira Ayuso1-27/+28
Move these options inside the scope of the 'if' NF_TABLES and NF_TABLES_IPV6 dependencies. This patch fixes: net/ipv6/netfilter/nft_chain_nat_ipv6.o: In function `nft_nat_do_chain': >> net/ipv6/netfilter/nft_chain_nat_ipv6.c:37: undefined reference to `nft_do_chain' net/ipv6/netfilter/nft_chain_nat_ipv6.o: In function `nft_chain_nat_ipv6_exit': >> net/ipv6/netfilter/nft_chain_nat_ipv6.c:94: undefined reference to `nft_unregister_chain_type' net/ipv6/netfilter/nft_chain_nat_ipv6.o: In function `nft_chain_nat_ipv6_init': >> net/ipv6/netfilter/nft_chain_nat_ipv6.c:87: undefined reference to `nft_register_chain_type' that happens with: CONFIG_NF_TABLES=m CONFIG_NFT_CHAIN_NAT_IPV6=y Fixes: 02c7b25e5f54 ("netfilter: nf_tables: build-in filter chain type") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-04-16xfrm: Fix warning in xfrm6_tunnel_net_exit.Steffen Klassert1-0/+3
We need to make sure that all states are really deleted before we check that the state lists are empty. Otherwise we trigger a warning. Fixes: baeb0dbbb5659 ("xfrm6_tunnel: exit_net cleanup check added") Reported-and-tested-by:syzbot+777bf170a89e7b326405@syzkaller.appspotmail.com Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-04-06net/ipv6: Increment OUTxxx counters after netfilter hookJeff Barnhill1-2/+5
At the end of ip6_forward(), IPSTATS_MIB_OUTFORWDATAGRAMS and IPSTATS_MIB_OUTOCTETS are incremented immediately before the NF_HOOK call for NFPROTO_IPV6 / NF_INET_FORWARD. As a result, these counters get incremented regardless of whether or not the netfilter hook allows the packet to continue being processed. This change increments the counters in ip6_forward_finish() so that it will not happen if the netfilter hook chooses to terminate the packet, which is similar to how IPv4 works. Signed-off-by: Jeff Barnhill <0xeffeff@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>