summaryrefslogtreecommitdiff
path: root/drivers/net/vxlan.c
AgeCommit message (Collapse)AuthorFilesLines
2015-01-06Fix race condition between vxlan_sock_add and vxlan_sock_releaseMarcelo Leitner1-7/+3
[ Upstream commit 00c83b01d58068dfeb2e1351cca6fccf2a83fa8f ] Currently, when trying to reuse a socket, vxlan_sock_add will grab vn->sock_lock, locate a reusable socket, inc refcount and release vn->sock_lock. But vxlan_sock_release() will first decrement refcount, and then grab that lock. refcnt operations are atomic but as currently we have deferred works which hold vs->refcnt each, this might happen, leading to a use after free (specially after vxlan_igmp_leave): CPU 1 CPU 2 deferred work vxlan_sock_add ... ... spin_lock(&vn->sock_lock) vs = vxlan_find_sock(); vxlan_sock_release dec vs->refcnt, reaches 0 spin_lock(&vn->sock_lock) vxlan_sock_hold(vs), refcnt=1 spin_unlock(&vn->sock_lock) hlist_del_rcu(&vs->hlist); vxlan_notify_del_rx_port(vs) spin_unlock(&vn->sock_lock) So when we look for a reusable socket, we check if it wasn't freed already before reusing it. Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Fixes: 7c47cedf43a8b3 ("vxlan: move IGMP join/leave to work queue") Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-11-19vxlan: Do not reuse sockets for a different address familyMarcelo Leitner1-10/+18
[ Upstream commit 19ca9fc1445b76b60d34148f7ff837b055f5dcf3 ] Currently, we only match against local port number in order to reuse socket. But if this new vxlan wants an IPv6 socket and a IPv4 one bound to that port, vxlan will reuse an IPv4 socket as IPv6 and a panic will follow. The following steps reproduce it: # ip link add vxlan6 type vxlan id 42 group 229.10.10.10 \ srcport 5000 6000 dev eth0 # ip link add vxlan7 type vxlan id 43 group ff0e::110 \ srcport 5000 6000 dev eth0 # ip link set vxlan6 up # ip link set vxlan7 up <panic> [ 4.187481] BUG: unable to handle kernel NULL pointer dereference at 0000000000000058 ... [ 4.188076] Call Trace: [ 4.188085] [<ffffffff81667c4a>] ? ipv6_sock_mc_join+0x3a/0x630 [ 4.188098] [<ffffffffa05a6ad6>] vxlan_igmp_join+0x66/0xd0 [vxlan] [ 4.188113] [<ffffffff810a3430>] process_one_work+0x220/0x710 [ 4.188125] [<ffffffff810a33c4>] ? process_one_work+0x1b4/0x710 [ 4.188138] [<ffffffff810a3a3b>] worker_thread+0x11b/0x3a0 [ 4.188149] [<ffffffff810a3920>] ? process_one_work+0x710/0x710 So address family must also match in order to reuse a socket. Reported-by: Jean-Tsung Hsiao <jhsiao@redhat.com> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-11-05vxlan: fix a free after useLi RongQing1-0/+1
[ Upstream commit 7a9f526fc3ee49b6034af2f243676ee0a27dcaa8 ] pskb_may_pull maybe change skb->data and make eth pointer oboslete, so eth needs to reload Fixes: 91269e390d062 ("vxlan: using pskb_may_pull as early as possible") Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05vxlan: using pskb_may_pull as early as possibleLi RongQing1-4/+2
[ Upstream commit 91269e390d062b526432f2ef1352b8df82e0e0bc ] pskb_may_pull should be used to check if skb->data has enough space, skb->len can not ensure that. Cc: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05vxlan: fix a use after free in vxlan_encap_bypassLi RongQing1-3/+5
[ Upstream commit ce6502a8f9572179f044a4d62667c4645256d6e4 ] when netif_rx() is done, the netif_rx handled skb maybe be freed, and should not be used. Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-17vxlan: fix incorrect initializer in union vxlan_addrGerhard Stenzel1-4/+4
[ Upstream commit a45e92a599e77ee6a850eabdd0141633fde03915 ] The first initializer in the following union vxlan_addr ipa = { .sin.sin_addr.s_addr = tip, .sa.sa_family = AF_INET, }; is optimised away by the compiler, due to the second initializer, therefore initialising .sin.sin_addr.s_addr always to 0. This results in netlink messages indicating a L3 miss never contain the missed IP address. This was observed with GCC 4.8 and 4.9. I do not know about previous versions. The problem affects user space programs relying on an IP address being sent as part of a netlink message indicating a L3 miss. Changing .sa.sa_family = AF_INET, to .sin.sin_family = AF_INET, fixes the problem. Signed-off-by: Gerhard Stenzel <gerhard.stenzel@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-06-23vxlan: use dev->needed_headroom instead of dev->hard_header_lenCong Wang1-4/+3
[ Upstream commit 2853af6a2ea1a8ed09b09dd4fb578e7f435e8d34 ] When we mirror packets from a vxlan tunnel to other device, the mirror device should see the same packets (that is, without outer header). Because vxlan tunnel sets dev->hard_header_len, tcf_mirred() resets mac header back to outer mac, the mirror device actually sees packets with outer headers Vxlan tunnel should set dev->needed_headroom instead of dev->hard_header_len, like what other ip tunnels do. This fixes the above problem. Cc: "David S. Miller" <davem@davemloft.net> Cc: stephen hemminger <stephen@networkplumber.org> Cc: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-18net: vxlan: fix crash when interface is created with no groupMike Rapoport1-1/+5
[ Upstream commit 5933a7bbb5de66482ea8aa874a7ebaf8e67603c4 ] If the vxlan interface is created without explicit group definition, there are corner cases which may cause kernel panic. For instance, in the following scenario: node A: $ ip link add dev vxlan42 address 2c:c2:60:00:10:20 type vxlan id 42 $ ip addr add dev vxlan42 10.0.0.1/24 $ ip link set up dev vxlan42 $ arp -i vxlan42 -s 10.0.0.2 2c:c2:60:00:01:02 $ bridge fdb add dev vxlan42 to 2c:c2:60:00:01:02 dst <IPv4 address> $ ping 10.0.0.2 node B: $ ip link add dev vxlan42 address 2c:c2:60:00:01:02 type vxlan id 42 $ ip addr add dev vxlan42 10.0.0.2/24 $ ip link set up dev vxlan42 $ arp -i vxlan42 -s 10.0.0.1 2c:c2:60:00:10:20 node B crashes: vxlan42: 2c:c2:60:00:10:20 migrated from 4011:eca4:c0a8:6466:c0a8:6415:8e09:2118 to (invalid address) vxlan42: 2c:c2:60:00:10:20 migrated from 4011:eca4:c0a8:6466:c0a8:6415:8e09:2118 to (invalid address) BUG: unable to handle kernel NULL pointer dereference at 0000000000000046 IP: [<ffffffff8143c459>] ip6_route_output+0x58/0x82 PGD 7bd89067 PUD 7bd4e067 PMD 0 Oops: 0000 [#1] SMP Modules linked in: CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.14.0-rc8-hvx-xen-00019-g97a5221-dirty #154 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 task: ffff88007c774f50 ti: ffff88007c79c000 task.ti: ffff88007c79c000 RIP: 0010:[<ffffffff8143c459>] [<ffffffff8143c459>] ip6_route_output+0x58/0x82 RSP: 0018:ffff88007fd03668 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffffffff8186a000 RCX: 0000000000000040 RDX: 0000000000000000 RSI: ffff88007b0e4a80 RDI: ffff88007fd03754 RBP: ffff88007fd03688 R08: ffff88007b0e4a80 R09: 0000000000000000 R10: 0200000a0100000a R11: 0001002200000000 R12: ffff88007fd03740 R13: ffff88007b0e4a80 R14: ffff88007b0e4a80 R15: ffff88007bba0c50 FS: 0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000046 CR3: 000000007bb60000 CR4: 00000000000006e0 Stack: 0000000000000000 ffff88007fd037a0 ffffffff8186a000 ffff88007fd03740 ffff88007fd036c8 ffffffff814320bb 0000000000006e49 ffff88007b8b7360 ffff88007bdbf200 ffff88007bcbc000 ffff88007b8b7000 ffff88007b8b7360 Call Trace: <IRQ> [<ffffffff814320bb>] ip6_dst_lookup_tail+0x2d/0xa4 [<ffffffff814322a5>] ip6_dst_lookup+0x10/0x12 [<ffffffff81323b4e>] vxlan_xmit_one+0x32a/0x68c [<ffffffff814a325a>] ? _raw_spin_unlock_irqrestore+0x12/0x14 [<ffffffff8104c551>] ? lock_timer_base.isra.23+0x26/0x4b [<ffffffff8132451a>] vxlan_xmit+0x66a/0x6a8 [<ffffffff8141a365>] ? ipt_do_table+0x35f/0x37e [<ffffffff81204ba2>] ? selinux_ip_postroute+0x41/0x26e [<ffffffff8139d0c1>] dev_hard_start_xmit+0x2ce/0x3ce [<ffffffff8139d491>] __dev_queue_xmit+0x2d0/0x392 [<ffffffff813b380f>] ? eth_header+0x28/0xb5 [<ffffffff8139d569>] dev_queue_xmit+0xb/0xd [<ffffffff813a5aa6>] neigh_resolve_output+0x134/0x152 [<ffffffff813db741>] ip_finish_output2+0x236/0x299 [<ffffffff813dc074>] ip_finish_output+0x98/0x9d [<ffffffff813dc749>] ip_output+0x62/0x67 [<ffffffff813da9f2>] dst_output+0xf/0x11 [<ffffffff813dc11c>] ip_local_out+0x1b/0x1f [<ffffffff813dcf1b>] ip_send_skb+0x11/0x37 [<ffffffff813dcf70>] ip_push_pending_frames+0x2f/0x33 [<ffffffff813ff732>] icmp_push_reply+0x106/0x115 [<ffffffff813ff9e4>] icmp_reply+0x142/0x164 [<ffffffff813ffb3b>] icmp_echo.part.16+0x46/0x48 [<ffffffff813c1d30>] ? nf_iterate+0x43/0x80 [<ffffffff813d8037>] ? xfrm4_policy_check.constprop.11+0x52/0x52 [<ffffffff813ffb62>] icmp_echo+0x25/0x27 [<ffffffff814005f7>] icmp_rcv+0x1d2/0x20a [<ffffffff813d8037>] ? xfrm4_policy_check.constprop.11+0x52/0x52 [<ffffffff813d810d>] ip_local_deliver_finish+0xd6/0x14f [<ffffffff813d8037>] ? xfrm4_policy_check.constprop.11+0x52/0x52 [<ffffffff813d7fde>] NF_HOOK.constprop.10+0x4c/0x53 [<ffffffff813d82bf>] ip_local_deliver+0x4a/0x4f [<ffffffff813d7f7b>] ip_rcv_finish+0x253/0x26a [<ffffffff813d7d28>] ? inet_add_protocol+0x3e/0x3e [<ffffffff813d7fde>] NF_HOOK.constprop.10+0x4c/0x53 [<ffffffff813d856a>] ip_rcv+0x2a6/0x2ec [<ffffffff8139a9a0>] __netif_receive_skb_core+0x43e/0x478 [<ffffffff812a346f>] ? virtqueue_poll+0x16/0x27 [<ffffffff8139aa2f>] __netif_receive_skb+0x55/0x5a [<ffffffff8139aaaa>] process_backlog+0x76/0x12f [<ffffffff8139add8>] net_rx_action+0xa2/0x1ab [<ffffffff81047847>] __do_softirq+0xca/0x1d1 [<ffffffff81047ace>] irq_exit+0x3e/0x85 [<ffffffff8100b98b>] do_IRQ+0xa9/0xc4 [<ffffffff814a37ad>] common_interrupt+0x6d/0x6d <EOI> [<ffffffff810378db>] ? native_safe_halt+0x6/0x8 [<ffffffff810110c7>] default_idle+0x9/0xd [<ffffffff81011694>] arch_cpu_idle+0x13/0x1c [<ffffffff8107480d>] cpu_startup_entry+0xbc/0x137 [<ffffffff8102e741>] start_secondary+0x1a0/0x1a5 Code: 24 14 e8 f1 e5 01 00 31 d2 a8 32 0f 95 c2 49 8b 44 24 2c 49 0b 44 24 24 74 05 83 ca 04 eb 1c 4d 85 ed 74 17 49 8b 85 a8 02 00 00 <66> 8b 40 46 66 c1 e8 07 83 e0 07 c1 e0 03 09 c2 4c 89 e6 48 89 RIP [<ffffffff8143c459>] ip6_route_output+0x58/0x82 RSP <ffff88007fd03668> CR2: 0000000000000046 ---[ end trace 4612329caab37efd ]--- When vxlan interface is created without explicit group definition, the default_dst protocol family is initialiazed to AF_UNSPEC and the driver assumes IPv4 configuration. On the other side, the default_dst protocol family is used to differentiate between IPv4 and IPv6 cases and, since, AF_UNSPEC != AF_INET, the processing takes the IPv6 path. Making the IPv4 assumption explicit by settting default_dst protocol family to AF_INET4 and preventing mixing of IPv4 and IPv6 addresses in snooped fdb entries fixes the corner case crashes. Signed-off-by: Mike Rapoport <mike.rapoport@ravellosystems.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-18vxlan: fix nonfunctional neigh_reduce()David Stevens1-14/+113
[ Upstream commit 4b29dba9c085a4fb79058fb1c45a2f6257ca3dfa ] The VXLAN neigh_reduce() code is completely non-functional since check-in. Specific errors: 1) The original code drops all packets with a multicast destination address, even though neighbor solicitations are sent to the solicited-node address, a multicast address. The code after this check was never run. 2) The neighbor table lookup used the IPv6 header destination, which is the solicited node address, rather than the target address from the neighbor solicitation. So neighbor lookups would always fail if it got this far. Also for L3MISSes. 3) The code calls ndisc_send_na(), which does a send on the tunnel device. The context for neigh_reduce() is the transmit path, vxlan_xmit(), where the host or a bridge-attached neighbor is trying to transmit a neighbor solicitation. To respond to it, the tunnel endpoint needs to do a *receive* of the appropriate neighbor advertisement. Doing a send, would only try to send the advertisement, encapsulated, to the remote destinations in the fdb -- hosts that definitely did not do the corresponding solicitation. 4) The code uses the tunnel endpoint IPv6 forwarding flag to determine the isrouter flag in the advertisement. This has nothing to do with whether or not the target is a router, and generally won't be set since the tunnel endpoint is bridging, not routing, traffic. The patch below creates a proxy neighbor advertisement to respond to neighbor solicitions as intended, providing proper IPv6 support for neighbor reduction. Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-18vxlan: fix potential NULL dereference in arp_reduce()David Stevens1-0/+3
[ Upstream commit 7346135dcd3f9b57f30a5512094848c678d7143e ] This patch fixes a NULL pointer dereference in the event of an skb allocation failure in arp_reduce(). Signed-Off-By: David L Stevens <dlstevens@us.ibm.com> Acked-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-02-06net/vxlan: Share RX skb de-marking and checksum checks with ovsOr Gerlitz1-11/+9
[ Upstream commit d0bc65557ad09a57b4db176e9e3ccddb26971453 ] Make sure the practice set by commit 0afb166 "vxlan: Add capability of Rx checksum offload for inner packet" is applied when the skb goes through the portion of the RX code which is shared between vxlan netdevices and ovs vxlan port instances. Cc: Joseph Gasparakis <joseph.gasparakis@intel.com> Cc: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-16vxlan: release rt when found circular routeFan Du1-1/+1
[ Upstream commit fffc15a5012e9052d3b236efc56840841a125416 ] Otherwise causing dst memory leakage. Have Checked all other type tunnel device transmit implementation, no such things happens anymore. Signed-off-by: Fan Du <fan.du@windriver.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-30vxlan: Use RCU apis to access sk_user_data.Pravin B Shelar1-6/+3
Use of RCU api makes vxlan code easier to understand. It also fixes bug due to missing ACCESS_ONCE() on sk_user_data dereference. In rare case without ACCESS_ONCE() compiler might omit vs on sk_user_data dereference. Compiler can use vs as alias for sk->sk_user_data, resulting in multiple sk_user_data dereference in rcu read context which could change. CC: Jesse Gross <jesse@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-18vxlan: Avoid creating fdb entry with NULL destinationSridhar Samudrala1-9/+13
Commit afbd8bae9c798c5cdbe4439d3a50536b5438247c vxlan: add implicit fdb entry for default destination creates an implicit fdb entry for default destination. This results in an invalid fdb entry if default destination is not specified. For ex: ip link add vxlan1 type vxlan id 100 creates the following fdb entry 00:00:00:00:00:00 dev vxlan1 dst 0.0.0.0 self permanent This patch fixes this issue by creating an fdb entry only if a valid default destination is specified. Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-16vxlan: Fix sparse warningsJoseph Gasparakis1-10/+8
This patch fixes sparse warnings when incorrectly handling the port number and using int instead of unsigned int iterating through &vn->sock_list[]. Keeping the port as __be16 also makes things clearer wrt endianess. Also, it was pointed out that vxlan_get_rx_port() had unnecessary checks which got removed. Signed-off-by: Joseph Gasparakis <joseph.gasparakis@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-05vxlan: Notify drivers for listening UDP port changesJoseph Gasparakis1-1/+67
This patch adds two more ndo ops: ndo_add_rx_vxlan_port() and ndo_del_rx_vxlan_port(). Drivers can get notifications through the above functions about changes of the UDP listening port of VXLAN. Also, when physical ports come up, now they can call vxlan_get_rx_port() in order to obtain the port number(s) of the existing VXLAN interface in case they already up before them. This information about the listening UDP port would be used for VXLAN related offloads. A big thank you to John Fastabend (john.r.fastabend@intel.com) for his input and his suggestions on this patch set. CC: John Fastabend <john.r.fastabend@intel.com> CC: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Joseph Gasparakis <joseph.gasparakis@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-04vxlan: Optimize vxlan rcvPravin B Shelar1-1/+6
vxlan-udp-recv function lookup vxlan_sock struct on every packet recv by using udp-port number. we can use sk->sk_user_data to store vxlan_sock and avoid lookup. I have open coded rcu-api to store and read vxlan_sock from sk_user_data to avoid sparse warning as sk_user_data is not __rcu pointer. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-04tunnels: harmonize cleanup done on skb on xmit pathNicolas Dichtel1-2/+4
The goal of this patch is to harmonize cleanup done on a skbuff on xmit path. Before this patch, behaviors were different depending of the tunnel type. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-04vxlan: remove net arg from vxlan[6]_xmit_skb()Nicolas Dichtel1-4/+4
This argument is not used, let's remove it. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-04iptunnels: remove net arg from iptunnel_xmit()Nicolas Dichtel1-2/+1
This argument is not used, let's remove it. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-04drivers/net: Convert uses of compare_ether_addr to ether_addr_equalJoe Perches1-4/+3
Use the new bool function ether_addr_equal to add some clarity and reduce the likelihood for misuse of compare_ether_addr for sorting. Done via cocci script: (and a little typing) $ cat compare_ether_addr.cocci @@ expression a,b; @@ - !compare_ether_addr(a, b) + ether_addr_equal(a, b) @@ expression a,b; @@ - compare_ether_addr(a, b) + !ether_addr_equal(a, b) @@ expression a,b; @@ - !ether_addr_equal(a, b) == 0 + ether_addr_equal(a, b) @@ expression a,b; @@ - !ether_addr_equal(a, b) != 0 + !ether_addr_equal(a, b) @@ expression a,b; @@ - ether_addr_equal(a, b) == 0 + !ether_addr_equal(a, b) @@ expression a,b; @@ - ether_addr_equal(a, b) != 0 + ether_addr_equal(a, b) @@ expression a,b; @@ - !!ether_addr_equal(a, b) + ether_addr_equal(a, b) Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-03vxlan: include net/ip6_checksum.h for csum_ipv6_magic()Cong Wang1-0/+1
Fengguang reported a compile warning: drivers/net/vxlan.c: In function 'vxlan6_xmit_skb': drivers/net/vxlan.c:1352:3: error: implicit declaration of function 'csum_ipv6_magic' [-Werror=implicit-function-declaration] cc1: some warnings being treated as errors this patch fixes it. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-03vxlan: fix flowi6_proto valueCong Wang1-1/+1
It should be IPPROTO_UDP. Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-01vxlan: add ipv6 proxy supportCong Wang1-2/+80
This patch adds the IPv6 version of "arp_reduce", ndisc_send_na() will be needed. Cc: David S. Miller <davem@davemloft.net> Cc: David Stevens <dlstevens@us.ibm.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-01vxlan: add ipv6 route short circuit supportCong Wang1-2/+28
route short circuit only has IPv4 part, this patch adds the IPv6 part. nd_tbl will be needed. Cc: David S. Miller <davem@davemloft.net> Cc: David Stevens <dlstevens@us.ibm.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-01vxlan: add ipv6 supportCong Wang1-146/+618
This patch adds IPv6 support to vxlan device, as the new version RFC already mentions it: http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-03 Cc: David Stevens <dlstevens@us.ibm.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-21vxlan: using kfree_rcu() to simplify the codeWei Yongjun1-7/+1
The callback function of call_rcu() just calls a kfree(), so we can use kfree_rcu() instead of call_rcu() + callback function. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-20vxlan: Add tx-vlan offload support.Pravin B Shelar1-1/+15
Following patch allows transmit side vlan offload for vxlan devices. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-20vxlan: Improve vxlan headroom calculation.Pravin B Shelar1-4/+9
Rather than having static headroom calculation, adjust headroom according to target device. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-20vxlan: Factor out vxlan send api.Pravin B Shelar1-37/+54
Following patch allows more code sharing between vxlan and ovs-vxlan. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-20vxlan: Extend vxlan handlers for openvswitch.Pravin B Shelar1-26/+19
Following patch adds data field to vxlan socket and export vxlan handler api. vh->data is required to store private data per vxlan handler. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-20vxlan: Add vxlan recv demux.Pravin B Shelar1-36/+67
Once we have ovs-vxlan functionality, one UDP port can be assigned to kernel-vxlan or ovs-vxlan port. Therefore following patch adds vxlan demux functionality, so that vxlan or ovs module can register for particular port. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-20vxlan: Restructure vxlan receive.Pravin B Shelar1-15/+7
Use iptunnel_pull_header() for better code sharing. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-20vxlan: Restructure vxlan socket apis.Pravin B Shelar1-41/+51
Restructure vxlan-socket management APIs so that it can be shared between vxlan and ovs modules. This patch does not change any functionality. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> v6-v7: - get rid of zero refcnt vs from hashtable. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-3/+1
2013-08-09vxlan: fix a soft lockup in vxlan module removalCong Wang1-2/+0
This is a regression introduced by: commit fe5c3561e6f0ac7c9546209f01351113c1b77ec8 Author: stephen hemminger <stephen@networkplumber.org> Date: Sat Jul 13 10:18:18 2013 -0700 vxlan: add necessary locking on device removal The problem is that vxlan_dellink(), which is called with RTNL lock held, tries to flush the workqueue synchronously, but apparently igmp_join and igmp_leave work need to hold RTNL lock too, therefore we have a soft lockup! As suggested by Stephen, probably the flush_workqueue can just be removed and let the normal refcounting work. The workqueue has a reference to device and socket, therefore the cleanups should work correctly. Suggested-by: Stephen Hemminger <stephen@networkplumber.org> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: David S. Miller <davem@davemloft.net> Tested-by: Cong Wang <amwang@redhat.com> Signed-off-by: Cong Wang <amwang@redhat.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-09vxlan: fix a regression of igmp joinCong Wang1-1/+1
This is a regression introduced by: commit 3fc2de2faba387218bdf9dbc6b13f513ac3b060a Author: stephen hemminger <stephen@networkplumber.org> Date: Thu Jul 18 08:40:15 2013 -0700 vxlan: fix igmp races Before this commit, the old code was: if (vxlan_group_used(vn, vxlan->default_dst.remote_ip)) ip_mc_join_group(sk, &mreq); else ip_mc_leave_group(sk, &mreq); therefore we shoud check vxlan_group_used(), not its opposite, for igmp_join. Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-05vxlan: fix rcu related warningstephen hemminger1-5/+11
Vxlan remote list is protected by RCU and guaranteed to be non-empty. Split out the rcu and non-rcu access to the list to fix warning Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-16/+41
Merge net into net-next to setup some infrastructure Eric Dumazet needs for usbnet changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-24vxlan fdb replace an existing entryThomas Richter1-0/+38
Add support to replace an existing entry found in the vxlan fdb database. The entry in question is identified by its unicast mac address and the destination information is changed. If the entry is not found, it is added in the forwarding database. This is similar to changing an entry in the neighbour table. Multicast mac addresses can not be changed with the replace option. This is useful for virtual machine migration when the destination of a target virtual machine changes. The replace feature can be used instead of delete followed by add. Resubmitted because net-next was closed last week. Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-20vxlan: fix igmp racesstephen hemminger1-15/+38
There are two race conditions in existing code for doing IGMP management in workqueue in vxlan. First, the vxlan_group_used function checks the list of vxlan's without any protection, and it is possible for open followed by close to occur before the igmp work queue runs. To solve these move the check into vxlan_open/stop so it is protected by RTNL. And split into two work structures so that there is no racy reference to underlying device state. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-20vxlan: unregister on namespace exitstephen hemminger1-1/+3
Fix memory leaks and other badness from VXLAN network namespace teardown. When network namespace is removed, all the vxlan devices should be unregistered (not closed). Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-17vxlan: add necessary locking on device removalstephen hemminger1-0/+6
The socket management is now done in workqueue (outside of RTNL) and protected by vn->sock_lock. There were two possible bugs, first the vxlan device was removed from the VNI hash table per socket without holding lock. And there was a race when device is created and the workqueue could run after deletion. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-11vxlan: Fix kernel crash on rmmod.Pravin B Shelar1-1/+1
vxlan exit module unregisters vxlan net and then it unregisters rtnl ops which triggers vxlan_dellink() from __rtnl_kill_links(). vxlan_dellink() deletes vxlan-dev from vxlan_list which has list-head in vxlan-net-struct but that is already gone due to net-unregister. That is how we are getting following crash. Following commit fixes the crash by fixing module exit path. BUG: unable to handle kernel paging request at ffff8804102c8000 IP: [<ffffffff812cc5e9>] __list_del_entry+0x29/0xd0 PGD 2972067 PUD 83e019067 PMD 83df97067 PTE 80000004102c8060 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC Modules linked in: --- CPU: 19 PID: 6712 Comm: rmmod Tainted: GF 3.10.0+ #95 Hardware name: Dell Inc. PowerEdge R620/0KCKR5, BIOS 1.4.8 10/25/2012 task: ffff88080c47c580 ti: ffff88080ac50000 task.ti: ffff88080ac50000 RIP: 0010:[<ffffffff812cc5e9>] [<ffffffff812cc5e9>] __list_del_entry+0x29/0xd0 RSP: 0018:ffff88080ac51e08 EFLAGS: 00010206 RAX: ffff8804102c8000 RBX: ffff88040f0d4b10 RCX: dead000000200200 RDX: ffff8804102c8000 RSI: ffff88080ac51e58 RDI: ffff88040f0d4b10 RBP: ffff88080ac51e08 R08: 0000000000000001 R09: 2222222222222222 R10: 2222222222222222 R11: 2222222222222222 R12: ffff88080ac51e58 R13: ffffffffa07b8840 R14: ffffffff81ae48c0 R15: ffff88080ac51e58 FS: 00007f9ef105c700(0000) GS:ffff88082a800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff8804102c8000 CR3: 00000008227e5000 CR4: 00000000000407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: ffff88080ac51e28 ffffffff812cc6a1 2222222222222222 ffff88040f0d4000 ffff88080ac51e48 ffffffffa07b3311 ffff88040f0d4000 ffffffff81ae49c8 ffff88080ac51e98 ffffffff81492fc2 ffff88080ac51e58 ffff88080ac51e58 Call Trace: [<ffffffff812cc6a1>] list_del+0x11/0x40 [<ffffffffa07b3311>] vxlan_dellink+0x51/0x70 [vxlan] [<ffffffff81492fc2>] __rtnl_link_unregister+0xa2/0xb0 [<ffffffff8149448e>] rtnl_link_unregister+0x1e/0x30 [<ffffffffa07b7b7c>] vxlan_cleanup_module+0x1c/0x2f [vxlan] [<ffffffff810c9b31>] SyS_delete_module+0x1d1/0x2c0 [<ffffffff812b8a0e>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff81582f42>] system_call_fastpath+0x16/0x1b Code: eb 9f 55 48 8b 17 48 b9 00 01 10 00 00 00 ad de 48 8b 47 08 48 89 e5 48 39 ca 74 29 48 b9 00 02 20 00 00 00 ad de 48 39 c8 74 7a <4c> 8b 00 4c 39 c7 75 53 4c 8b 42 08 4c 39 c7 75 2b 48 89 42 08 RIP [<ffffffff812cc5e9>] __list_del_entry+0x29/0xd0 RSP <ffff88080ac51e08> CR2: ffff8804102c8000 Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-26vxlan: fix function name spellingStephen Hemminger1-3/+3
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2013-06-25vxlan: fdb: allow specifying multiple destinations for zero MACMike Rapoport1-1/+2
The zero MAC entry in the fdb is used as default destination. With multiple default destinations it is possible to use vxlan in environments that disable multicast on the infrastructure level, e.g. public clouds. Signed-off-by: Mike Rapoport <mike.rapoport@ravellosystems.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2013-06-25vxlan: allow removal of single destination from fdb entryMike Rapoport1-4/+40
When the last item is deleted from the remote destinations list, the fdb entry is destroyed. Signed-off-by: Mike Rapoport <mike.rapoport@ravellosystems.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2013-06-25vxlan: introduce vxlan_fdb_parseMike Rapoport1-31/+50
which will be reused by vxlan_fdb_delete Signed-off-by: Mike Rapoport <mike.rapoport@ravellosystems.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2013-06-25vxlan: introduce vxlan_fdb_find_rdstMike Rapoport1-5/+18
which will be reused by vxlan_fdb_delete Signed-off-by: Mike Rapoport <mike.rapoport@ravellosystems.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2013-06-25vxlan: add implicit fdb entry for default destinationMike Rapoport1-19/+49
Signed-off-by: Mike Rapoport <mike.rapoport@ravellosystems.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>