summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2014-06-06MPLS: Use mpls_features to activate software MPLS GSO segmentationSimon Horman1-1/+27
If an MPLS packet requires segmentation then use mpls_features to determine if the software implementation should be used. As no driver advertises MPLS GSO segmentation this will always be the case. I had not noticed that this was necessary before as software MPLS GSO segmentation was already being used in my test environment. I believe that the reason for that is the skbs in question always had fragments and the driver I used does not advertise NETIF_F_FRAGLIST (which seems to be the case for most drivers). Thus software segmentation was activated by skb_gso_ok(). This introduces the overhead of an extra call to skb_network_protocol() in the case where where CONFIG_NET_MPLS_GSO is set and skb->ip_summed == CHECKSUM_NONE. Thanks to Jesse Gross for prompting me to investigate this. Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-05ipv4: use skb frags api in udp4_hwcsum()WANG Cong1-4/+5
Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-05net: use the new API kvfree()WANG Cong11-58/+12
It is available since v3.15-rc5. Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-05dns_resolver: Do not accept domain names longer than 255 charsManuel Schölling1-2/+2
According to RFC1035 "[...] the total length of a domain name (i.e., label octets and label length octets) is restricted to 255 octets or less." Signed-off-by: Manuel Schölling <manuel.schoelling@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-05vxlan: Add support for UDP checksums (v4 sending, v6 zero csums)Tom Herbert1-1/+1
Added VXLAN link configuration for sending UDP checksums, and allowing TX and RX of UDP6 checksums. Also, call common iptunnel_handle_offloads and added GSO support for checksums. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-05gre: Call gso_make_checksumTom Herbert8-4/+17
Call gso_make_checksum. This should have the benefit of using a checksum that may have been previously computed for the packet. This also adds NETIF_F_GSO_GRE_CSUM to differentiate devices that offload GRE GSO with and without the GRE checksum offloaed. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-05net: Add GSO support for UDP tunnels with checksumTom Herbert6-23/+28
Added a new netif feature for GSO_UDP_TUNNEL_CSUM. This indicates that a device is capable of computing the UDP checksum in the encapsulating header of a UDP tunnel. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-05tcp: Call gso_make_checksumTom Herbert1-5/+2
Call common gso_make_checksum when calculating checksum for a TCP GSO segment. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-05net: Support for multiple checksums with gsoTom Herbert2-1/+15
When creating a GSO packet segment we may need to set more than one checksum in the packet (for instance a TCP checksum and UDP checksum for VXLAN encapsulation). To be efficient, we want to do checksum calculation for any part of the packet at most once. This patch adds csum_start offset to skb_gso_cb. This tracks the starting offset for skb->csum which is initially set in skb_segment. When a protocol needs to compute a transport checksum it calls gso_make_checksum which computes the checksum value from the start of transport header to csum_start and then adds in skb->csum to get the full checksum. skb->csum and csum_start are then updated to reflect the checksum of the resultant packet starting from the transport header. This patch also adds a flag to skbuff, encap_hdr_csum, which is set in *gso_segment fucntions to indicate that a tunnel protocol needs checksum calculation Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-05l2tp: call udp{6}_set_csumTom Herbert1-49/+5
Call common functions to set checksum for UDP tunnel. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-05udp: Generic functions to set checksumTom Herbert2-0/+75
Added udp_set_csum and udp6_set_csum functions to set UDP checksums in packets. These are for simple UDP packets such as those that might be created in UDP tunnels. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-05net: Revert "fib_trie: use seq_file_net rather than seq->private"Sasha Levin1-1/+1
This reverts commit 30f38d2fdd79f13fc929489f7e6e517b4a4bfe63. fib_triestat is surrounded by a big lie: while it claims that it's a seq_file (fib_triestat_seq_open, fib_triestat_seq_show), it isn't: static const struct file_operations fib_triestat_fops = { .owner = THIS_MODULE, .open = fib_triestat_seq_open, .read = seq_read, .llseek = seq_lseek, .release = single_release_net, }; Yes, fib_triestat is just a regular file. A small detail (assuming CONFIG_NET_NS=y) is that while for seq_files you could do seq_file_net() to get the net ptr, doing so for a regular file would be wrong and would dereference an invalid pointer. The fib_triestat lie claimed a victim, and trying to show the file would be bad for the kernel. This patch just reverts the issue and fixes fib_triestat, which still needs a rewrite to either be a seq_file or stop claiming it is. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller13-44/+131
Conflicts: include/net/inetpeer.h net/ipv6/output_core.c Changes in net were fixing bugs in code removed in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-04net: remove some unless free on failure in alloc_netdev_mqs()WANG Cong1-5/+0
When we jump to free_pcpu on failure in alloc_netdev_mqs() rx and tx queues are not yet allocated, so no need to free them. Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-04rtnetlink: fix a memory leak when ->newlink failsCong Wang1-3/+7
It is possible that ->newlink() fails before registering the device, in this case we should just free it, it's safe to call free_netdev(). Fixes: commit 0e0eee2465df77bcec2 (net: correct error path in rtnl_newlink()) Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-04xfrm: fix race between netns cleanup and state expire notificationMichal Kubecek1-11/+25
The xfrm_user module registers its pernet init/exit after xfrm itself so that its net exit function xfrm_user_net_exit() is executed before xfrm_net_exit() which calls xfrm_state_fini() to cleanup the SA's (xfrm states). This opens a window between zeroing net->xfrm.nlsk pointer and deleting all xfrm_state instances which may access it (via the timer). If an xfrm state expires in this window, xfrm_exp_state_notify() will pass null pointer as socket to nlmsg_multicast(). As the notifications are called inside rcu_read_lock() block, it is sufficient to retrieve the nlsk socket with rcu_dereference() and check the it for null. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-03Merge branch 'ethtool-rssh-fixes' of ↵David S. Miller1-62/+48
git://git.kernel.org/pub/scm/linux/kernel/git/bwh/net-next Ben Hutchings says: ==================== Pull request: Fixes for new ethtool RSS commands This addresses several problems I previously identified with the new ETHTOOL_{G,S}RSSH commands: 1. Missing validation of reserved parameters 2. Vague documentation 3. Use of unnamed magic number 4. No consolidation with existing driver operations I don't currently have access to suitable network hardware, but have tested these changes with a dummy driver that can support various combinations of operations and sizes, together with (a) Debian's ethtool 3.13 (b) ethtool 3.14 with the submitted patch to use ETHTOOL_{G,S}RSSH and minor adjustment for fixes 1 and 3. v2: Update RSS operations in vmxnet3 too ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-03ethtool: Check that reserved fields of struct ethtool_rxfh are 0Ben Hutchings1-53/+36
We should fail rather than silently ignoring use of these extensions. Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-06-03ethtool: Replace ethtool_ops::{get,set}_rxfh_indir() with {get,set}_rxfh()Ben Hutchings1-4/+4
ETHTOOL_{G,S}RXFHINDIR and ETHTOOL_{G,S}RSSH should work for drivers regardless of whether they expose the hash key, unless you try to set a hash key for a driver that doesn't expose it. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-06-03bridge: Add bridge ifindex to bridge fdb notify msgsRoopa Prabhu1-0/+3
(This patch was previously posted as RFC at http://patchwork.ozlabs.org/patch/352677/) This patch adds NDA_MASTER attribute to neighbour attributes enum for bridge/master ifindex. And adds NDA_MASTER to bridge fdb notify msgs. Today bridge fdb notifications dont contain bridge information. Userspace can derive it from the port information in the fdb notification. However this is tricky in some scenarious. Example, bridge port delete notification comes before bridge fdb delete notifications. And we have seen problems in userspace when using libnl where, the bridge fdb delete notification handling code does not understand which bridge this fdb entry is part of because the bridge and port association has already been deleted. And these notifications (port membership and fdb) are generated on separate rtnl groups. Fixing the order of notifications could possibly solve the problem for some cases (I can submit a separate patch for that). This patch chooses to add NDA_MASTER to bridge fdb notify msgs because it not only solves the problem described above, but also helps userspace avoid another lookup into link msgs to derive the master index. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-03net: filter: fix possible memory leak in __sk_prepare_filter()Leon Yu1-1/+6
__sk_prepare_filter() was reworked in commit bd4cf0ed3 (net: filter: rework/optimize internal BPF interpreter's instruction set) so that it should have uncharged memory once things went wrong. However that work isn't complete. Error is handled only in __sk_migrate_filter() while memory can still leak in the error path right after sk_chk_filter(). Fixes: bd4cf0ed331a ("net: filter: rework/optimize internal BPF interpreter's instruction set") Signed-off-by: Leon Yu <chianglungyu@gmail.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Tested-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-03tcp: fix cwnd undo on DSACK in F-RTOYuchung Cheng1-6/+5
This bug is discovered by an recent F-RTO issue on tcpm list https://www.ietf.org/mail-archive/web/tcpm/current/msg08794.html The bug is that currently F-RTO does not use DSACK to undo cwnd in certain cases: upon receiving an ACK after the RTO retransmission in F-RTO, and the ACK has DSACK indicating the retransmission is spurious, the sender only calls tcp_try_undo_loss() if some never retransmisted data is sacked (FLAG_ORIG_DATA_SACKED). The correct behavior is to unconditionally call tcp_try_undo_loss so the DSACK information is used properly to undo the cwnd reduction. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-03fib_trie: use seq_file_net rather than seq->privateDavid Ahern1-1/+1
Make fib_triestat_seq_show consistent with other /proc/net files and use seq_file_net. Signed-off-by: David Ahern <dsahern@gmail.com> Cc: David S. Miller <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-03netlink: Only check file credentials for implicit destinationsEric W. Biederman1-1/+6
It was possible to get a setuid root or setcap executable to write to it's stdout or stderr (which has been set made a netlink socket) and inadvertently reconfigure the networking stack. To prevent this we check that both the creator of the socket and the currentl applications has permission to reconfigure the network stack. Unfortunately this breaks Zebra which always uses sendto/sendmsg and creates it's socket without any privileges. To keep Zebra working don't bother checking if the creator of the socket has privilege when a destination address is specified. Instead rely exclusively on the privileges of the sender of the socket. Note from Andy: This is exactly Eric's code except for some comment clarifications and formatting fixes. Neither I nor, I think, anyone else is thrilled with this approach, but I'm hesitant to wait on a better fix since 3.15 is almost here. Note to stable maintainers: This is a mess. An earlier series of patches in 3.15 fix a rather serious security issue (CVE-2014-0181), but they did so in a way that breaks Zebra. The offending series includes: commit aa4cf9452f469f16cea8c96283b641b4576d4a7b Author: Eric W. Biederman <ebiederm@xmission.com> Date: Wed Apr 23 14:28:03 2014 -0700 net: Add variants of capable for use on netlink messages If a given kernel version is missing that series of fixes, it's probably worth backporting it and this patch. if that series is present, then this fix is critical if you care about Zebra. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-03net: fix inet_getid() and ipv6_select_ident() bugsEric Dumazet1-8/+3
I noticed we were sending wrong IPv4 ID in TCP flows when MTU discovery is disabled. Note how GSO/TSO packets do not have monotonically incrementing ID. 06:37:41.575531 IP (id 14227, proto: TCP (6), length: 4396) 06:37:41.575534 IP (id 14272, proto: TCP (6), length: 65212) 06:37:41.575544 IP (id 14312, proto: TCP (6), length: 57972) 06:37:41.575678 IP (id 14317, proto: TCP (6), length: 7292) 06:37:41.575683 IP (id 14361, proto: TCP (6), length: 63764) It appears I introduced this bug in linux-3.1. inet_getid() must return the old value of peer->ip_id_count, not the new one. Lets revert this part, and remove the prevention of a null identification field in IPv6 Fragment Extension Header, which is dubious and not even done properly. Fixes: 87c48fa3b463 ("ipv6: make fragment identifications less predictable") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-03bridge: Prevent insertion of FDB entry with disallowed vlanToshiaki Makita3-2/+37
br_handle_local_finish() is allowing us to insert an FDB entry with disallowed vlan. For example, when port 1 and 2 are communicating in vlan 10, and even if vlan 10 is disallowed on port 3, port 3 can interfere with their communication by spoofed src mac address with vlan id 10. Note: Even if it is judged that a frame should not be learned, it should not be dropped because it is destined for not forwarding layer but higher layer. See IEEE 802.1Q-2011 8.13.10. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02Merge branch 'for-davem' of ↵David S. Miller23-157/+645
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next John W. Linville says: ==================== pull request: wireless-next 2014-06-02 Please pull this remaining batch of updates intended for the 3.16 stream... For the mac80211 bits, Johannes says: "The remainder for -next right now is mostly fixes, and a handful of small new things like some CSA infrastructure, the regdb script mW/dBm conversion change and sending wiphy notifications." For the bluetooth bits, Gustavo says: "Some more patches for 3.16. There is nothing really special here, just a bunch of clean ups, fixes plus some small improvements. Please pull." For the nfc bits, Samuel says: "We have: - Felica (Type3) tags support for trf7970a - Type 4b tags support for port100 - st21nfca DTS typo fix - A few sparse warning fixes" For the atheros bits, Kalle says: "Ben added support for setting antenna configurations. Michal improved warm reset so that we would not need to fall back to cold reset that often, an issue where ath10k stripped protected flag while in monitor mode and made module initialisation asynchronous to fix the problems with firmware loading when the driver is linked to the kernel. Luca removed unused channel_switch_beacon callbacks both from ath9k and ath10k. Marek fixed Protected Management Frames (PMF) when using Action Frames. Also we had other small fixes everywhere in the driver." Along with that, there are a handful of updates to a variety of drivers. This includes updates to at76c50x-usb, ath9k, b43, brcmfmac, mwifiex, rsi, rtlwifi, and wil6210. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02inetpeer: get rid of ip_id_countEric Dumazet12-113/+38
Ideally, we would need to generate IP ID using a per destination IP generator. linux kernels used inet_peer cache for this purpose, but this had a huge cost on servers disabling MTU discovery. 1) each inet_peer struct consumes 192 bytes 2) inetpeer cache uses a binary tree of inet_peer structs, with a nominal size of ~66000 elements under load. 3) lookups in this tree are hitting a lot of cache lines, as tree depth is about 20. 4) If server deals with many tcp flows, we have a high probability of not finding the inet_peer, allocating a fresh one, inserting it in the tree with same initial ip_id_count, (cf secure_ip_id()) 5) We garbage collect inet_peer aggressively. IP ID generation do not have to be 'perfect' Goal is trying to avoid duplicates in a short period of time, so that reassembly units have a chance to complete reassembly of fragments belonging to one message before receiving other fragments with a recycled ID. We simply use an array of generators, and a Jenkin hash using the dst IP as a key. ipv6_select_ident() is put back into net/ipv6/ip6_output.c where it belongs (it is only used from this file) secure_ip_id() and secure_ipv6_id() no longer are needed. Rename ip_select_ident_more() to ip_select_ident_segs() to avoid unnecessary decrement/increment of the number of segments. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02net: Add support for device specific address syncingAlexander Duyck1-0/+85
This change provides a function to be used in order to break the ndo_set_rx_mode call into a set of address add and remove calls. The code is based on the implementation of dev_uc_sync/dev_mc_sync. Since they essentially do the same thing but with only one dev I simply named my functions __dev_uc_sync/__dev_mc_sync. I also implemented an unsync version of the functions as well to allow for cleanup on close. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-026lowpan_rtnl: fix off by one while fragmentationAlexander Aring1-1/+1
This patch fix a off by one error while fragmentation. If the frag_cap value is equal to skb_unprocessed value we need to stop the fragmentation loop because the last fragment which has a size of skb_unprocessed fits into the frag capability size. This issue was introduced by commit d4b2816d67d6e07b2f27037f282d8db03a5829d7 ("6lowpan: fix fragmentation"). Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-026lowpan_rtnl: fix fragmentation with two fragmentsAlexander Aring1-2/+2
This patch fix the 6LoWPAN fragmentation for the case if we have exactly two fragments. The problem is that the (skb_unprocessed >= frag_cap) condition is always false on the second fragment after sending the first fragment. A fragmentation with only one fragment doesn't make any sense. The solution is that we use a do while loop here, that ensures we sending always a minimum of two fragments if we need a fragmentation. This issue was introduced by commit d4b2816d67d6e07b2f27037f282d8db03a5829d7 ("6lowpan: fix fragmentation"). Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02genetlink: remove superfluous assignmentDenis ChengRq1-5/+1
the local variable ops and n_ops were just read out from family, and not changed, hence no need to assign back. Validation functions should operate on const parameters and not change anything. Signed-off-by: Cheng Renquan <crquan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02Merge branch 'master' of ↵John W. Linville23-157/+645
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
2014-06-02net: filter: improve filter block macrosDaniel Borkmann1-124/+72
Commit 9739eef13c92 ("net: filter: make BPF conversion more readable") started to introduce helper macros similar to BPF_STMT()/BPF_JUMP() macros from classic BPF. However, quite some statements in the filter conversion functions remained in the old style which gives a mixture of block macros and non block macros in the code. This patch makes the block macros itself more readable by using explicit member initialization, and converts the remaining ones where possible to remain in a more consistent state. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02net: filter: get rid of BPF_S_* enumDaniel Borkmann1-216/+125
This patch finally allows us to get rid of the BPF_S_* enum. Currently, the code performs unnecessary encode and decode workarounds in seccomp and filter migration itself when a filter is being attached in order to overcome BPF_S_* encoding which is not used anymore by the new interpreter resp. JIT compilers. Keeping it around would mean that also in future we would need to extend and maintain this enum and related encoders/decoders. We can get rid of all that and save us these operations during filter attaching. Naturally, also JIT compilers need to be updated by this. Before JIT conversion is being done, each compiler checks if A is being loaded at startup to obtain information if it needs to emit instructions to clear A first. Since BPF extensions are a subset of BPF_LD | BPF_{W,H,B} | BPF_ABS variants, case statements for extensions can be removed at that point. To ease and minimalize code changes in the classic JITs, we have introduced bpf_anc_helper(). Tested with test_bpf on x86_64 (JIT, int), s390x (JIT, int), arm (JIT, int), i368 (int), ppc64 (JIT, int); for sparc we unfortunately didn't have access, but changes are analogous to the rest. Joint work with Alexei Starovoitov. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Mircea Gherzan <mgherzan@gmail.com> Cc: Kees Cook <keescook@chromium.org> Acked-by: Chema Gonzalez <chemag@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02bridge: notify user space after fdb updateJon Maxwell1-1/+7
There has been a number incidents recently where customers running KVM have reported that VM hosts on different Hypervisors are unreachable. Based on pcap traces we found that the bridge was broadcasting the ARP request out onto the network. However some NICs have an inbuilt switch which on occasions were broadcasting the VMs ARP request back through the physical NIC on the Hypervisor. This resulted in the bridge changing ports and incorrectly learning that the VMs mac address was external. As a result the ARP reply was directed back onto the external network and VM never updated it's ARP cache. This patch will notify the bridge command, after a fdb has been updated to identify such port toggling. Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com> Reviewed-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02bridge: fix the unbalanced promiscuous count when add_if failedwangweidong1-1/+1
As commit 2796d0c648c94 ("bridge: Automatically manage port promiscuous mode."), make the add_if use dev_set_allmulti instead of dev_set_promiscuous, so when add_if failed, we should do dev_set_allmulti(dev, -1). Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Reviewed-by: Amos Kong <akong@redhat.com> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02net: fix wrong mac_len calculation for vlansNikolay Aleksandrov1-10/+25
After 1e785f48d29a ("net: Start with correct mac_len in skb_network_protocol") skb->mac_len is used as a start of the calculation in skb_network_protocol() but that is not always correct. If skb->protocol == 8021Q/AD, usually the vlan header is already inserted in the skb (i.e. vlan reorder hdr == 0). Usually when the packet enters dev_hard_xmit it has mac_len == 0 so we take 2 bytes from the destination mac address (skb->data + VLAN_HLEN) as a type in skb_network_protocol() and return vlan_depth == 4. In the case where TSO is off, then the mac_len is set but it's == 18 (ETH_HLEN + VLAN_HLEN), so skb_network_protocol() returns a type from inside the packet and offset == 22. Also make vlan_depth unsigned as suggested before. As suggested by Eric Dumazet, move the while() loop in the if() so we can avoid additional testing in fast path. Here are few netperf tests + debug printk's to illustrate: cat netperf.tso-on.reorder-on.bugged - Vlan -> device (reorder on, default, this case is okay) MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.3.1 () port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.00 7111.54 [ 81.605435] skb->len 65226 skb->gso_size 1448 skb->proto 0x800 skb->mac_len 0 vlan_depth 0 type 0x800 - Vlan -> device (reorder off, bad) cat netperf.tso-on.reorder-off.bugged MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.3.1 () port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.00 241.35 [ 204.578332] skb->len 1518 skb->gso_size 0 skb->proto 0x8100 skb->mac_len 0 vlan_depth 4 type 0x5301 0x5301 are the last two bytes of the destination mac. And if we stop TSO, we may get even the following: [ 83.343156] skb->len 2966 skb->gso_size 1448 skb->proto 0x8100 skb->mac_len 18 vlan_depth 22 type 0xb84 Because mac_len already accounts for VLAN_HLEN. After the fix: cat netperf.tso-on.reorder-off.fixed MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.3.1 () port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.01 5001.46 [ 81.888489] skb->len 65230 skb->gso_size 1448 skb->proto 0x8100 skb->mac_len 0 vlan_depth 18 type 0x800 CC: Vlad Yasevich <vyasevic@redhat.com> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Daniel Borkman <dborkman@redhat.com> CC: David S. Miller <davem@davemloft.net> Fixes:1e785f48d29a ("net: Start with correct mac_len in skb_network_protocol") Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-01Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-mergeDavid S. Miller1-3/+3
Included changes: - prevent NULL dereference in multicast code Antonion Quartulli says: ==================== pull request net: batman-adv 20140527 here you have another very small fix intended for net/linux-3.15. It prevents some multicast functions from dereferencing a NULL pointer. (Actually it was nothing more than a typo) I hope it is not too late for such a small patch. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-31batman-adv: fix NULL pointer dereferencesMarek Lindner1-3/+3
Was introduced with 4c8755d69cbde2ec464a39c932aed0a83f9ff89f ("batman-adv: Send multicast packets to nodes with a WANT_ALL flag") Reported-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Acked-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2014-05-31Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller1-5/+10
Pablo Neira Ayuso says: ==================== The following patchset contains a late fix for IPVS: * Fix crash when trying to remove the transport header with non-linear skbuffs, this was introduced in 3.6-rc. Patch from Peter Christensen via the IPVS folks. I'll pass this to -stable once this hits mainstream. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller8-50/+118
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next This small patchset contains three accumulated Netfilter/IPVS updates, they are: 1) Refactorize common NAT code by encapsulating it into a helper function, similarly to what we do in other conntrack extensions, from Florian Westphal. 2) A minor format string mismatch fix for IPVS, from Masanari Iida. 3) Add quota support to the netfilter accounting infrastructure, now you can add quotas to accounting objects via the nfnetlink interface and use them from iptables. You can also listen to quota notifications from userspace. This enhancement from Mathieu Poirier. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-31af_key: Replace comma with semicolonHimangi Saraogi1-1/+1
This patch replaces a comma between expression statements by a semicolon. A simplified version of the semantic patch that performs this transformation is as follows: // <smpl> @r@ expression e1,e2,e; type T; identifier i; @@ e1 -, +; e2; // </smpl> Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-31rds/tcp_listen: Replace comma with semicolonHimangi Saraogi1-1/+1
This patch replaces a comma between expression statements by a semicolon. A simplified version of the semantic patch that performs this transformation is as follows: // <smpl> @r@ expression e1,e2,e; type T; identifier i; @@ e1 -, +; e2; // </smpl> Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-31RDS/RDMA: Replace comma with semicolonHimangi Saraogi1-1/+1
This patch replaces a comma between expression statements by a semicolon. A simplified version of the semantic patch that performs this transformation is as follows: // <smpl> @r@ expression e1,e2,e; type T; identifier i; @@ e1 -, +; e2; // </smpl> Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-31ipmr: Replace comma with semicolonHimangi Saraogi1-1/+1
This patch replaces a comma between expression statements by a semicolon. A simplified version of the semantic patch that performs this transformation is as follows: // <smpl> @r@ expression e1,e2,e; type T; identifier i; @@ e1 -, +; e2; // </smpl> Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-31af_iucv: correct cleanup if listen backlog is fullUrsula Braun1-2/+1
In case of transport HIPER a sock struct is allocated for an incoming connect request. If the backlog queue is full this socket is not needed, but is left in the list of af_iucv sockets. Final socket release posts console message "Attempt to release alive iucv socket". This patch makes sure the new created socket is cleaned up correctly if the backlog queue is full. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com> Reported-by: Philipp Hachtmann <phacht@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-31af_iucv: Add automatic (source) iucv_name to bindPhilipp Hachtmann1-11/+18
If a socket is bound to an address using before calling connect it is usual to leave it to the network system to choose an appropriate outgoing application name respective port address. af_iucv on VM uses a counter and uses simple numbers as unique identifiers. This behaviour was missing when af_iucv is used with HiperSockets. This patch contains a simple approach to harmonize af_iucv's behaviour. Signed-off-by: Philipp Hachtmann <phacht@linux.vnet.ibm.com> Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-31Merge branch 'for-davem' of ↵David S. Miller46-653/+1570
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next John W. Linville says: ==================== Please pull this batch of updates intended for 3.16... For the mac80211 bits, Johannes says: "Here I just have Heikki's rfkill GPIO cleanups. The ARM/tegra patch is OK with the maintainer (Stephen). Let me know of any problems." and; "We have a whole bunch of work on CSA by Andrei, Luca and Michal, but unfortunately it doesn't seem quite complete yet so it's still disabled. There's some TDLS work from Arik, and the rest is mostly minor fixes and cleanups." For the NFC bits, Samuel says: "This is the NFC pull request for 3.16. We have: - STMicroeectronics st21nfca support. The st21nfca is an HCI chipset and thus relies on the HCI stack. This submission provides support for tag redaer/writer mode (including Type 5) and device tree bindings. - PM runtime support and a bunch of bug fixes for TI's trf7970a. - Device tree support for NXP's pn544. Legacy platform data support is obviously kept intact. - NFC Tag type 4B support to the NFC Digital stack. - SOCK_RAW type support to the raw NFC socket, and allow NCI sniffing from that. This can be extended to report HCI frames and also proprietarry ones like e.g. the pn533 ones." For the iwlwifi bits, Emmanuel says: "Eran continues to work on new devices, Eyal is still digging in the rate control stuff, and Johannes added new functionality to the debug system we have in place now along with a few cleanups he made on the way. That's pretty much it." and; "Avri continues to work on the power code and Eran is improving the NVM handling as a preparations for new devices on which he works with Liad. Luca cleans up a bit the code while working on CSA. I have the regular BT Coex stuff and a small lockdep fix. Johannes has his regular amount of clean ups and improvements, the main one is the ability to leave 2 chains open to improve diversity and hence the throughput in high attenuation scenarios." and; "The regular amount of housekeeping here. I merged iwlwifi-fixes.git to be able to add the patch you didn't want in wireless.git at that stage of the -rc cycle. Luca has a few preparations for CSA implementation and also what seems to be a bugfix for P2P but hasn't caused issues we could notice." For the Atheros bits, Kalle says: "For ath10k Michal did various small fixes on how we handle hardware/firmware problems and he also fixed two memory leaks." Also included are a couple of pulls from the wireless tree to avoid/resolve merge issues... ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-31atm: remove commented out checkPaul Bolle1-8/+2
This preprocessor check is commented out ever since this file was added during the v2.3 development cycle. It is unclear what it purpose might have been. Whatever it was, it can safely be removed now. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: David S. Miller <davem@davemloft.net>