summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2015-07-30openvswitch: Re-add CONFIG_OPENVSWITCH_VXLANThomas Graf5-200/+235
This readds the config option CONFIG_OPENVSWITCH_VXLAN to avoid a hard dependency of OVS on VXLAN. It moves the VXLAN config compat code to vport-vxlan.c and allows compliation as a module. Fixes: 614732eaa12d ("openvswitch: Use regular VXLAN net_device device") Fixes: 2661371ace96 ("openvswitch: fix compilation when vxlan is a module") Cc: Pravin B Shelar <pshelar@nicira.com> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30net: pktgen: Remove unused 'allocated_skbs' fieldBogdan Hamciuc1-2/+0
Field pktgen_dev.allocated_skbs had been written to, but never read from. The number of allocated skbs can be deduced anyway, from the total number of sent packets and the 'clone_skb' param. Signed-off-by: Bogdan Hamciuc <bogdan.hamciuc@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30net: pktgen: Observe needed_headroom of the deviceBogdan Hamciuc1-1/+2
Allocate enough space so as not to force the outgoing net device to do skb_realloc_headroom(). Signed-off-by: Bogdan Hamciuc <bogdan.hamciuc@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30lwtunnel: Make lwtun_encaps[] staticThomas Graf1-1/+1
Any external user should use the registration API instead of accessing this directly. Cc: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30net: Set sk_txhash from a random numberTom Herbert4-6/+6
This patch creates sk_set_txhash and eliminates protocol specific inet_set_txhash and ip6_set_txhash. sk_set_txhash simply sets a random number instead of performing flow dissection. sk_set_txash is also allowed to be called multiple times for the same socket, we'll need this when redoing the hash for negative routing advice. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30tipc: fix bug in broadcast synch message create functionJon Maloy1-0/+3
In commit d999297c3dbbe7fdd832f7fa4ec84301e170b3e6 ("tipc: reduce locking scope during packet reception") we introduced a new function tipc_build_bcast_sync_msg(), which carries initial synchronization data between two nodes at first contact and at re-contact. In this function, we missed to add synchronization data, with the effect that the broadcast link endpoints will fail to synchronize correctly at re-contact between a running and a restarted node. All other cases work as intended. With this commit, we fix this bug. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30packet: remove handling of tx_ring from prb_shutdown_retire_blk_timer()Tobias Klauser1-4/+2
Follow e8e85cc5eb57 ("packet: remove handling of tx_ring") and remove the tx_ring parameter from prb_shutdown_retire_blk_timer() as it is only called with tx_ring = 0. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27lwtunnel: use kfree_skb() instead of vanilla kfree()Dan Carpenter1-1/+1
kfree_skb() is correct here. Fixes: ffce41962ef6 ('lwtunnel: support dst output redirect function') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27tcp: tso: allow deferring under reordering stateEric Dumazet1-1/+1
While doing experiments with reordering resilience, we found linux senders were not able to send at full speed under reordering, because every incoming SACK was releasing one MSS. This patch removes the limitation, as we did for CWR state in commit a0ea700e409 ("tcp: tso: allow CA_CWR state in tcp_tso_should_defer()") Neal Cardwell had a concern about limited transmit so Yuchung conducted experiments on GFE and found nothing worth adding an extra check on fast path : if (icsk->icsk_ca_state == TCP_CA_Disorder && tcp_sk(sk)->reordering == sysctl_tcp_reordering) goto send_now; Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27ipv6: Avoid rt6_probe() taking writer lock in the fast pathMartin KaFai Lau1-0/+4
The patch checks neigh->nud_state before acquiring the writer lock. Note that rt6_probe() is only used in CONFIG_IPV6_ROUTER_PREF. 40 udpflood processes and a /64 gateway route are used. The gateway has NUD_PERMANENT. Each of them is run for 30s. At the end, the total number of finished sendto(): Before: 55M After: 95M Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> CC: Julian Anastasov <ja@ssi.bg> CC: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27ipv6: Re-arrange code in rt6_probe()Martin KaFai Lau1-24/+20
It is a prep work for the next patch to remove write_lock from rt6_probe(). 1. Reduce the number of if(neigh) check. From 4 to 1. 2. Bring the write_(un)lock() closer to the operations that the lock is protecting. Hopefully, the above make rt6_probe() more readable. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Julian Anastasov <ja@ssi.bg> Cc: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27lwtunnel: change prototype of lwtunnel_state_get()Nicolas Dichtel4-19/+9
It saves some lines and simplify a bit the code when the state is returning by this function. It's also useful to handle a NULL entry. To avoid too long lines, I've also renamed lwtunnel_state_get() and lwtunnel_state_put() to lwtstate_get() and lwtstate_put(). CC: Thomas Graf <tgraf@suug.ch> CC: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27ipv6: copy lwtstate in ip6_rt_copy_init()Nicolas Dichtel1-0/+4
We need to copy this field (ip6_rt_cache_alloc() and ip6_rt_pcpu_alloc() use ip6_rt_copy_init() to build a dst). CC: Thomas Graf <tgraf@suug.ch> CC: Roopa Prabhu <roopa@cumulusnetworks.com> Fixes: 19e42e451506 ("ipv6: support for fib route lwtunnel encap attributes") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27ipv6: use lwtunnel_output6() only if flag redirect is setNicolas Dichtel1-1/+2
This function make sense only when LWTUNNEL_STATE_OUTPUT_REDIRECT is set. The check is already done in IPv4. CC: Thomas Graf <tgraf@suug.ch> CC: Roopa Prabhu <roopa@cumulusnetworks.com> Fixes: 74a0f2fe8ed5 ("ipv6: rt6_info output redirect to tunnel output") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27dev: Spelling fix in commentssubashab@codeaurora.org1-2/+2
Fix the following typo - unchainged -> unchanged Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27bridge: mdb: notify on router port add and delSatish Ashok3-2/+77
Send notifications on router port add and del/expire, re-use the already existing MDBA_ROUTER and send NEWMDB/DELMDB netlink notifications respectively. Signed-off-by: Satish Ashok <sashok@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27openvswitch: Retrieve tunnel metadata when receiving from vport-netdevThomas Graf1-1/+1
Retrieve the tunnel metadata for packets received by a net_device and provide it to ovs_vport_receive() for flow key extraction. [This hunk was in the GRE patch in the initial series and missed the cut for the initial submission for merging.] Fixes: 614732eaa12d ("openvswitch: Use regular VXLAN net_device device") Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27openvswitch: fix compilation when vxlan is a moduleNicolas Dichtel1-0/+1
With CONFIG_VXLAN=m and CONFIG_OPENVSWITCH=y, there was the following compilation error: LD init/built-in.o net/built-in.o: In function `vxlan_tnl_create': .../net/openvswitch/vport-netdev.c:322: undefined reference to `vxlan_dev_create' make: *** [vmlinux] Error 1 CC: Thomas Graf <tgraf@suug.ch> Fixes: 614732eaa12d ("openvswitch: Use regular VXLAN net_device device") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27ipv4: be more aggressive when probing alternative gatewaysJulian Anastasov1-1/+3
Currently, we do not notice if new alternative gateways are added. We can do it by checking for present neigh entry. Also, gateways that are currently probed (NUD_INCOMPLETE) can be skipped from round-robin probing. Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27ipv6: fix crash over flow-based vxlan deviceWei-Chun Chao1-2/+3
Similar check was added in ip_rcv but not in ipv6_rcv. BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff81734e0a>] ipv6_rcv+0xfa/0x500 Call Trace: [<ffffffff816c9786>] ? ip_rcv+0x296/0x400 [<ffffffff817732d2>] ? packet_rcv+0x52/0x410 [<ffffffff8168e99f>] __netif_receive_skb_core+0x63f/0x9a0 [<ffffffffc02b34a0>] ? br_handle_frame_finish+0x580/0x580 [bridge] [<ffffffff8109912c>] ? update_rq_clock.part.81+0x1c/0x40 [<ffffffff8168ed18>] __netif_receive_skb+0x18/0x60 [<ffffffff8168fa1f>] process_backlog+0x9f/0x150 Fixes: ee122c79d422 (vxlan: Flow based tunneling) Signed-off-by: Wei-Chun Chao <weichunc@plumgrid.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27tipc: clean up socket layer message receptionJon Paul Maloy4-146/+134
When a message is received in a socket, one of the call chains tipc_sk_rcv()->tipc_sk_enqueue()->filter_rcv()(->tipc_sk_proto_rcv()) or tipc_sk_backlog_rcv()->filter_rcv()(->tipc_sk_proto_rcv()) are followed. At each of these levels we may encounter situations where the message may need to be rejected, or a new message produced for transfer back to the sender. Despite recent improvements, the current code for doing this is perceived as awkward and hard to follow. Leveraging the two previous commits in this series, we now introduce a more uniform handling of such situations. We let each of the functions in the chain itself produce/reverse the message to be returned to the sender, but also perform the actual forwarding. This simplifies the necessary logics within each function. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27tipc: introduce new tipc_sk_respond() functionJon Paul Maloy3-39/+47
Currently, we use the code sequence if (msg_reverse()) tipc_link_xmit_skb() at numerous locations in socket.c. The preparation of arguments for these calls, as well as the sequence itself, makes the code unecessarily complex. In this commit, we introduce a new function, tipc_sk_respond(), that performs this call combination. We also replace some, but not yet all, of these explicit call sequences with calls to the new function. Notably, we let the function tipc_sk_proto_rcv() use the new function to directly send out PROBE_REPLY messages, instead of deferring this to the calling tipc_sk_rcv() function, as we do now. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27tipc: let function tipc_msg_reverse() expand header when neededJon Paul Maloy3-34/+48
The shortest TIPC message header, for cluster local CONNECTED messages, is 24 bytes long. With this format, the fields "dest_node" and "orig_node" are optimized away, since they in reality are redundant in this particular case. However, the absence of these fields leads to code inconsistencies that are difficult to handle in some cases, especially when we need to reverse or reject messages at the socket layer. In this commit, we concentrate the handling of the absent fields to one place, by letting the function tipc_msg_reverse() reallocate the buffer and expand the header to 32 bytes when necessary. This means that the socket code now can assume that the two previously absent fields are present in the header when a message needs to be rejected. This opens up for some further simplifications of the socket code. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-25bridge: Fix setting a flag in br_fill_ifvlaninfo_range().Rosen, Rami1-2/+0
This patch fixes setting of vinfo.flags in the br_fill_ifvlaninfo_range() method. The assignment of vinfo.flags &= ~BRIDGE_VLAN_INFO_RANGE_BEGIN has no effect and is unneeded, as vinfo.flags value is overriden by the immediately following vinfo.flags = flags | BRIDGE_VLAN_INFO_RANGE_END assignement. Signed-off-by: Rami Rosen <rami.rosen@intel.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-23ip_tunnel: Call ip_tunnel_core_init() from inet_init()Thomas Graf2-10/+4
Convert the module_init() to a invocation from inet_init() since ip_tunnel_core is part of the INET built-in. Fixes: 3093fbe7ff4 ("route: Per route IP tunnel metadata via lightweight tunnel") Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller32-136/+298
Conflicts: net/bridge/br_mdb.c br_mdb.c conflict was a function call being removed to fix a bug in 'net' but whose signature was changed in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds31-135/+295
Pull networking fixes from David Miller: 1) Don't use shared bluetooth antenna in iwlwifi driver for management frames, from Emmanuel Grumbach. 2) Fix device ID check in ath9k driver, from Felix Fietkau. 3) Off by one in xen-netback BUG checks, from Dan Carpenter. 4) Fix IFLA_VF_PORT netlink attribute validation, from Daniel Borkmann. 5) Fix races in setting peeked bit flag in SKBs during datagram receive. If it's shared we have to clone it otherwise the value can easily be corrupted. Fix from Herbert Xu. 6) Revert fec clock handling change, causes regressions. From Fabio Estevam. 7) Fix use after free in fq_codel and sfq packet schedulers, from WANG Cong. 8) ipvlan bug fixes (memory leaks, missing rcu_dereference_bh, etc.) from WANG Cong and Konstantin Khlebnikov. 9) Memory leak in act_bpf packet action, from Alexei Starovoitov. 10) ARM bpf JIT bug fixes from Nicolas Schichan. 11) Fix backwards compat of ANY_LAYOUT in virtio_net driver, from Michael S Tsirkin. 12) Destruction of bond with different ARP header types not handled correctly, fix from Nikolay Aleksandrov. 13) Revert GRO receive support in ipv6 SIT tunnel driver, causes regressions because the GRO packets created cannot be processed properly on the GSO side if we forward the frame. From Herbert Xu. 14) TCCR update race and other fixes to ravb driver from Sergei Shtylyov. 15) Fix SKB leaks in caif_queue_rcv_skb(), from Eric Dumazet. 16) Fix panics on packet scheduler filter replace, from Daniel Borkmann. 17) Make sure AF_PACKET sees properly IP headers in defragmented frames (via PACKET_FANOUT_FLAG_DEFRAG option), from Edward Hyunkoo Jee. 18) AF_NETLINK cannot hold mutex in RCU callback, fix from Florian Westphal. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (84 commits) ravb: fix ring memory allocation net: phy: dp83867: Fix warning check for setting the internal delay openvswitch: allocate nr_node_ids flow_stats instead of num_possible_nodes netlink: don't hold mutex in rcu callback when releasing mmapd ring ARM: net: fix vlan access instructions in ARM JIT. ARM: net: handle negative offsets in BPF JIT. ARM: net: fix condition for load_order > 0 when translating load instructions. tcp: suppress a division by zero warning drivers: net: cpsw: remove tx event processing in rx napi poll inet: frags: fix defragmented packet's IP header for af_packet net: mvneta: fix refilling for Rx DMA buffers stmmac: fix setting of driver data in stmmac_dvr_probe sched: cls_flow: fix panic on filter replace sched: cls_flower: fix panic on filter replace sched: cls_bpf: fix panic on filter replace net/mdio: fix mdio_bus_match for c45 PHY net: ratelimit warnings about dst entry refcount underflow or overflow caif: fix leaks and race in caif_queue_rcv_skb() qmi_wwan: add the second QMI/network interface for Sierra Wireless MC7305/MC7355 ravb: fix race updating TCCR ...
2015-07-22ipv6: sysctl to restrict candidate source addressesErik Kline1-3/+19
Per RFC 6724, section 4, "Candidate Source Addresses": It is RECOMMENDED that the candidate source addresses be the set of unicast addresses assigned to the interface that will be used to send to the destination (the "outgoing" interface). Add a sysctl to enable this behaviour. Signed-off-by: Erik Kline <ek@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-22mpls_iptunnel: fix sparse warn: remove incorrect rcu_dereferenceRoopa Prabhu1-1/+1
fix for: net/mpls/mpls_iptunnel.c:73:19: sparse: incompatible types in comparison expression (different address spaces) remove incorrect rcu_dereference possibly left over from earlier revisions of the code. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-22net: track success and failure of TCP PMTU probingRick Jones2-0/+4
Track success and failure of TCP PMTU probing. Signed-off-by: Rick Jones <rick.jones2@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-22mpls: make RTA_OIF optionalRoopa Prabhu1-1/+66
If user did not specify an oif, try and get it from the via address. If failed to get device, return with -ENODEV. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-22openvswitch: allocate nr_node_ids flow_stats instead of num_possible_nodesChris J Arges1-1/+1
Some architectures like POWER can have a NUMA node_possible_map that contains sparse entries. This causes memory corruption with openvswitch since it allocates flow_cache with a multiple of num_possible_nodes() and assumes the node variable returned by for_each_node will index into flow->stats[node]. Use nr_node_ids to allocate a maximal sparse array instead of num_possible_nodes(). The crash was noticed after 3af229f2 was applied as it changed the node_possible_map to match node_online_map on boot. Fixes: 3af229f2071f5b5cb31664be6109561fbe19c861 Signed-off-by: Chris J Arges <chris.j.arges@canonical.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-22netlink: don't hold mutex in rcu callback when releasing mmapd ringFlorian Westphal1-32/+47
Kirill A. Shutemov says: This simple test-case trigers few locking asserts in kernel: int main(int argc, char **argv) { unsigned int block_size = 16 * 4096; struct nl_mmap_req req = { .nm_block_size = block_size, .nm_block_nr = 64, .nm_frame_size = 16384, .nm_frame_nr = 64 * block_size / 16384, }; unsigned int ring_size; int fd; fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC); if (setsockopt(fd, SOL_NETLINK, NETLINK_RX_RING, &req, sizeof(req)) < 0) exit(1); if (setsockopt(fd, SOL_NETLINK, NETLINK_TX_RING, &req, sizeof(req)) < 0) exit(1); ring_size = req.nm_block_nr * req.nm_block_size; mmap(NULL, 2 * ring_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); return 0; } +++ exited with 0 +++ BUG: sleeping function called from invalid context at /home/kas/git/public/linux-mm/kernel/locking/mutex.c:616 in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: init 3 locks held by init/1: #0: (reboot_mutex){+.+...}, at: [<ffffffff81080959>] SyS_reboot+0xa9/0x220 #1: ((reboot_notifier_list).rwsem){.+.+..}, at: [<ffffffff8107f379>] __blocking_notifier_call_chain+0x39/0x70 #2: (rcu_callback){......}, at: [<ffffffff810d32e0>] rcu_do_batch.isra.49+0x160/0x10c0 Preemption disabled at:[<ffffffff8145365f>] __delay+0xf/0x20 CPU: 1 PID: 1 Comm: init Not tainted 4.1.0-00009-gbddf4c4818e0 #253 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Debian-1.8.2-1 04/01/2014 ffff88017b3d8000 ffff88027bc03c38 ffffffff81929ceb 0000000000000102 0000000000000000 ffff88027bc03c68 ffffffff81085a9d 0000000000000002 ffffffff81ca2a20 0000000000000268 0000000000000000 ffff88027bc03c98 Call Trace: <IRQ> [<ffffffff81929ceb>] dump_stack+0x4f/0x7b [<ffffffff81085a9d>] ___might_sleep+0x16d/0x270 [<ffffffff81085bed>] __might_sleep+0x4d/0x90 [<ffffffff8192e96f>] mutex_lock_nested+0x2f/0x430 [<ffffffff81932fed>] ? _raw_spin_unlock_irqrestore+0x5d/0x80 [<ffffffff81464143>] ? __this_cpu_preempt_check+0x13/0x20 [<ffffffff8182fc3d>] netlink_set_ring+0x1ed/0x350 [<ffffffff8182e000>] ? netlink_undo_bind+0x70/0x70 [<ffffffff8182fe20>] netlink_sock_destruct+0x80/0x150 [<ffffffff817e484d>] __sk_free+0x1d/0x160 [<ffffffff817e49a9>] sk_free+0x19/0x20 [..] Cong Wang says: We can't hold mutex lock in a rcu callback, [..] Thomas Graf says: The socket should be dead at this point. It might be simpler to add a netlink_release_ring() function which doesn't require locking at all. Reported-by: "Kirill A. Shutemov" <kirill@shutemov.name> Diagnosed-by: Cong Wang <cwang@twopensource.com> Suggested-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-22tcp: suppress a division by zero warningEric Dumazet1-6/+5
Andrew Morton reported following warning on one ARM build with gcc-4.4 : net/ipv4/inet_hashtables.c: In function 'inet_ehash_locks_alloc': net/ipv4/inet_hashtables.c:617: warning: division by zero Even guarded with a test on sizeof(spinlock_t), compiler does not like current construct on a !CONFIG_SMP build. Remove the warning by using a temporary variable. Fixes: 095dc8e0c368 ("tcp: fix/cleanup inet_ehash_locks_alloc()") Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-22tipc: fix compatibility bugJon Paul Maloy1-1/+1
In commit d999297c3dbbe7fdd832f7fa4ec84301e170b3e6 ("tipc: reduce locking scope during packet reception") we introduced a new function tipc_link_proto_rcv(). This function contains a bug, so that it sometimes by error sends out a non-zero link priority value in created protocol messages. The bug may lead to an extra link reset at initial link establising with older nodes. This will never happen more than once, whereafter the link will work as intended. We fix this bug in this commit. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-22net: #ifdefify sk_classid member of struct sockMathias Krause1-0/+4
The sk_classid member is only required when CONFIG_CGROUP_NET_CLASSID is enabled. #ifdefify it to reduce the size of struct sock on 32 bit systems, at least. Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-21openvswitch: Use regular VXLAN net_device deviceThomas Graf7-361/+218
This gets rid of all OVS specific VXLAN code in the receive and transmit path by using a VXLAN net_device to represent the vport. Only a small shim layer remains which takes care of handling the VXLAN specific OVS Netlink configuration. Unexports vxlan_sock_add(), vxlan_sock_release(), vxlan_xmit_skb() since they are no longer needed. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-21openvswitch: Abstract vport name through ovs_vport_name()Thomas Graf6-12/+9
This allows to get rid of the get_name() vport ops later on. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-21openvswitch: Move dev pointer into vport itselfThomas Graf6-91/+59
This is the first step in representing all OVS vports as regular struct net_devices. Move the net_device pointer into the vport structure itself to get rid of struct vport_netdev. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-21openvswitch: Make tunnel set action attach a metadata dstThomas Graf6-13/+79
Utilize the new metadata dst to attach encapsulation instructions to the skb. The existing egress_tun_info via the OVS_CB() is left in place until all tunnel vports have been converted to the new method. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-21fib: Add fib rule match on tunnel idThomas Graf2-2/+38
This add the ability to select a routing table based on the tunnel id which allows to maintain separate routing tables for each virtual tunnel network. ip rule add from all tunnel-id 100 lookup 100 ip rule add from all tunnel-id 200 lookup 200 A new static key controls the collection of metadata at tunnel level upon demand. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-21route: Per route IP tunnel metadata via lightweight tunnelThomas Graf3-1/+116
This introduces a new IP tunnel lightweight tunnel type which allows to specify IP tunnel instructions per route. Only IPv4 is supported at this point. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-21route: Extend flow representation with tunnel keyThomas Graf2-0/+10
Add a new flowi_tunnel structure which is a subset of ip_tunnel_key to allow routes to match on tunnel metadata. For now, the tunnel id is added to flowi_tunnel which allows for routes to be bound to specific virtual tunnels. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-21arp: Inherit metadata dst when creating ARP requestsThomas Graf1-28/+37
If output device wants to see the dst, inherit the dst of the original skb and pass it on to generate the ARP request. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-21dst: Metadata destinationsThomas Graf4-16/+75
Introduces a new dst_metadata which enables to carry per packet metadata between forwarding and processing elements via the skb->dst pointer. The structure is set up to be a union. Thus, each separate type of metadata requires its own dst instance. If demand arises to carry multiple types of metadata concurrently, metadata dst entries can be made stackable. The metadata dst entry is refcnt'ed as expected for now but a non reference counted use is possible if the reference is forced before queueing the skb. In order to allow allocating dsts with variable length, the existing dst_alloc() is split into a dst_alloc() and dst_init() function. The existing dst_init() function to initialize the subsystem is being renamed to dst_subsys_init() to make it clear what is what. The check before ip_route_input() is changed to ignore metadata dsts and drop the dst inside the routing function thus allowing to interpret metadata in a later commit. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-21icmp: Don't leak original dst into ip_route_input()Thomas Graf1-0/+1
ip_route_input() unconditionally overwrites the dst. Hide the original dst attached to the skb by calling skb_dst_set(skb, NULL) prior to ip_route_input(). Reported-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-21ip_tunnel: Make ovs_tunnel_info and ovs_key_ipv4_tunnel genericThomas Graf11-134/+64
Rename the tunnel metadata data structures currently internal to OVS and make them generic for use by all IP tunnels. Both structures are kernel internal and will stay that way. Their members are exposed to user space through individual Netlink attributes by OVS. It will therefore be possible to extend/modify these structures without affecting user ABI. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-21mpls: ip tunnel supportRoopa Prabhu3-1/+241
This implementation uses lwtunnel infrastructure to register hooks for mpls tunnel encaps. It picks cues from iptunnel_encaps infrastructure and previous mpls iptunnel RFC patches from Eric W. Biederman and Robert Shearman Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-21mpls: export mpls functions for use by mpls iptunnelsRoopa Prabhu2-5/+15
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-21ipv6: rt6_info output redirect to tunnel outputRoopa Prabhu1-0/+1
This is similar to ipv4 redirect of dst output to lwtunnel output function for encapsulation and xmit. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>