summaryrefslogtreecommitdiff
path: root/net/core
AgeCommit message (Collapse)AuthorFilesLines
2013-08-09netprio_cgroup: pass around @css instead of @cgroup and kill struct ↵Tejun Heo1-28/+28
cgroup_netprio_state cgroup controller API will be converted to primarily use struct cgroup_subsys_state instead of struct cgroup. In preparation, make the internal functions of netprio_cgroup pass around @css instead of @cgrp. While at it, kill struct cgroup_netprio_state which only contained struct cgroup_subsys_state without serving any purpose. All functions are converted to deal with @css directly. This patch shouldn't cause any behavior differences. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: David S. Miller <davem@davemloft.net>
2013-08-09cgroup: s/cgroup_subsys_state/cgroup_css/ s/task_subsys_state/task_css/Tejun Heo1-1/+1
The names of the two struct cgroup_subsys_state accessors - cgroup_subsys_state() and task_subsys_state() - are somewhat awkward. The former clashes with the type name and the latter doesn't even indicate it's somehow related to cgroup. We're about to revamp large portion of cgroup API, so, let's rename them so that they're less awkward. Most per-controller usages of the accessors are localized in accessor wrappers and given the amount of scheduled changes, this isn't gonna add any noticeable headache. Rename cgroup_subsys_state() to cgroup_css() and task_subsys_state() to task_css(). This patch is pure rename. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2013-08-08net: use skb_copy_datagram_from_iovec() in zerocopy_sg_from_iovec()Jason Wang1-27/+10
Use skb_copy_datagram_from_iovec() to avoid code duplication and make it easy to be read. Also we can do the skipping inside the zero-copy loop. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-08net: use release_pages() in zerocopy_sg_from_iovec()Jason Wang1-4/+2
To reduce the duplicated codes. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-08net: remove the useless comment in zerocopy_sg_from_iovec()Jason Wang1-1/+0
Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-08net: use skb_fill_page_desc() in zerocopy_sg_from_iovec()Jason Wang1-2/+1
Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-08net: move zerocopy_sg_from_iovec() to net/core/datagram.cJason Wang1-0/+93
To let it be reused and reduce code duplication. Also document this function. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-08net: move iov_pages() to net/core/iovec.cJason Wang1-0/+24
To let it be reused and reduce code duplication. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-06neighbour: populate neigh_parms on alloc before calling ndo_neigh_setupVeaceslav Falico1-4/+6
dev->ndo_neigh_setup() might need some of the values of neigh_parms, so populate them before calling it. Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller4-14/+34
Merge net into net-next to setup some infrastructure Eric Dumazet needs for usbnet changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-03fib_rules: fix suppressor names and default valuesStefan Tomanek1-4/+11
This change brings the suppressor attribute names into line; it also changes the data types to provide a more consistent interface. While -1 indicates that the suppressor is not enabled, values >= 0 for suppress_prefixlen or suppress_ifgroup reject routing decisions violating the constraint. This changes the previously presented behaviour of suppress_prefixlen, where a prefix length _less_ than the attribute value was rejected. After this change, a prefix length less than *or* equal to the value is considered a violation of the rule constraint. It also changes the default values for default and newly added rules (disabling any suppression for those). Signed-off-by: Stefan Tomanek <stefan.tomanek@wertarbyte.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-03neighbour: populate neigh_parms on alloc before calling ndo_neigh_setupVeaceslav Falico1-4/+6
dev->ndo_neigh_setup() might need some of the values of neigh_parms, so populate them before calling it. Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-03fib_rules: add route suppression based on ifgroupStefan Tomanek1-0/+10
This change adds the ability to suppress a routing decision based upon the interface group the selected interface belongs to. This allows it to exclude specific devices from a routing decision. Signed-off-by: Stefan Tomanek <stefan.tomanek@wertarbyte.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-03net: check net.core.somaxconn sysctl valuesRoman Gushchin1-1/+5
It's possible to assign an invalid value to the net.core.somaxconn sysctl variable, because there is no checks at all. The sk_max_ack_backlog field of the sock structure is defined as unsigned short. Therefore, the backlog argument in inet_listen() shouldn't exceed USHRT_MAX. The backlog argument in the listen() syscall is truncated to the somaxconn value. So, the somaxconn value shouldn't exceed 65535 (USHRT_MAX). Also, negative values of somaxconn are meaningless. before: $ sysctl -w net.core.somaxconn=256 net.core.somaxconn = 256 $ sysctl -w net.core.somaxconn=65536 net.core.somaxconn = 65536 $ sysctl -w net.core.somaxconn=-100 net.core.somaxconn = -100 after: $ sysctl -w net.core.somaxconn=256 net.core.somaxconn = 256 $ sysctl -w net.core.somaxconn=65536 error: "Invalid argument" setting key "net.core.somaxconn" $ sysctl -w net.core.somaxconn=-100 error: "Invalid argument" setting key "net.core.somaxconn" Based on a prior patch from Changli Gao. Signed-off-by: Roman Gushchin <klamm@yandex-team.ru> Reported-by: Changli Gao <xiaosuo@gmail.com> Suggested-by: Eric Dumazet <edumazet@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-02net: rename CONFIG_NET_LL_RX_POLL to CONFIG_NET_RX_BUSY_POLLCong Wang3-5/+5
Eliezer renames several *ll_poll to *busy_poll, but forgets CONFIG_NET_LL_RX_POLL, so in case of confusion, rename it too. Cc: Eliezer Tamir <eliezer.tamir@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-01fib_rules: add .suppress operationStefan Tomanek1-0/+8
This change adds a new operation to the fib_rules_ops struct; it allows the suppression of routing decisions if certain criteria are not met by its results. The first implemented constraint is a minimum prefix length added to the structures of routing rules. If a rule is added with a minimum prefix length >0, only routes meeting this threshold will be considered. Any other (more general) routing table entries will be ignored. When configuring a system with multiple network uplinks and default routes, it is often convinient to reference the main routing table multiple times - but omitting the default route. Using this patch and a modified "ip" utility, this can be achieved by using the following command sequence: $ ip route add table secuplink default via 10.42.23.1 $ ip rule add pref 100 table main prefixlength 1 $ ip rule add pref 150 fwmark 0xA table secuplink With this setup, packets marked 0xA will be processed by the additional routing table "secuplink", but only if no suitable route in the main routing table can be found. By using a minimal prefixlength of 1, the default route (/0) of the table "main" is hidden to packets processed by rule 100; packets traveling to destinations with more specific routing entries are processed as usual. Signed-off-by: Stefan Tomanek <stefan.tomanek@wertarbyte.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-01netem: Introduce skb_orphan_partial() helperEric Dumazet1-0/+19
Commit 547669d483e578 ("tcp: xps: fix reordering issues") added unexpected reorders in case netem is used in a MQ setup for high performance test bed. ETH=eth0 tc qd del dev $ETH root 2>/dev/null tc qd add dev $ETH root handle 1: mq for i in `seq 1 32` do tc qd add dev $ETH parent 1:$i netem delay 100ms done As all tcp packets are orphaned by netem, TCP stack believes it can set skb->ooo_okay on all packets. In order to allow producers to send more packets, we want to keep sk_wmem_alloc from reaching sk_sndbuf limit. We can do that by accounting one byte per skb in netem queues, so that TCP stack is not fooled too much. Tested: With above MQ/netem setup, scaling number of concurrent flows gives linear results and no reorders/retransmits lpq83:~# for n in 1 10 20 30 40 50 60 70 80 90 100 do echo -n "n:$n " ; ./super_netperf $n -H 10.7.7.84; done n:1 198.46 n:10 2002.69 n:20 4000.98 n:30 6006.35 n:40 8020.93 n:50 10032.3 n:60 12081.9 n:70 13971.3 n:80 16009.7 n:90 17117.3 n:100 17425.5 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-31flow_dissector: add support for IPPROTO_IPV6Tom Herbert1-0/+3
Support IPPROTO_IPV6 similar to IPPROTO_IPIP Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-31flow_dissector: clean up IPIP caseTom Herbert1-1/+2
Explicitly set proto to ETH_P_IP and jump directly to ip processing. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-31net: export physical port id via sysfsJiri Pirko1-0/+22
Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: Narendra K <narendra_k@dell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-31rtnl: export physical port id via RT netlinkJiri Pirko1-1/+24
Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: Narendra K <narendra_k@dell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-31net: add ndo to get id of physical port of the deviceJiri Pirko1-0/+18
This patch adds a ndo for getting physical port of the device. Driver which is aware of being virtual function of some physical port should implement this ndo. This is applicable not only for IOV, but for other solutions (NPAR, multichannel) as well. Basically if there is possible to have multiple netdevs on the single hw port. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-29pktgen: add needed include fileStephen Rothwell1-0/+1
Fixes this on PowerPC (at least): net/core/pktgen.c: In function 'fill_packet_ipv6': net/core/pktgen.c:2906:3: error: implicit declaration of function 'csum_ipv6_magic' [-Werror=implicit-function-declaration] udph->check = ~csum_ipv6_magic(&iph->saddr, &iph->daddr, udplen, IPPROTO_UDP, 0); ^ Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-28pktgen: Use ip_send_check() to compute checksumThomas Graf1-2/+1
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-28pktgen: Add UDPCSUM flag to support UDP checksumsThomas Graf1-5/+52
UDP checksums are optional, hence pktgen has been omitting them in favour of performance. The optional flag UDPCSUM enables UDP checksumming. If the output device supports hardware checksumming the skb is prepared and marked CHECKSUM_PARTIAL, otherwise the checksum is generated in software. Signed-off-by: Thomas Graf <tgraf@suug.ch> Cc: Eric Dumazet <edumazet@google.com> Cc: Ben Greear <greearb@candelatech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-27net: core: convert class code to use dev_groupsGreg Kroah-Hartman1-61/+75
The dev_attrs field of struct class is going away soon, dev_groups should be used instead. This converts the networking core class code to use the correct field. In order to do this in the "cleanest" way, some of the macros had to be changed to reflect the driver core format of naming show/store functions, which accounts for the majority of the churn in this file. Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-27neigh: prevent overflowing params in /proc/sys/net/ipv4/neigh/Francesco Fusco1-7/+22
Without this patch, the fields app_solicit, gc_thresh1, gc_thresh2, gc_thresh3, proxy_qlen, ucast_solicit, mcast_solicit could have assumed negative values when setting large numbers. Signed-off-by: Francesco Fusco <ffusco@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-25net: fix comment above build_skb()Florian Fainelli1-1/+2
build_skb() specifies that the data parameter must come from a kmalloc'd area, this is only true if frag_size equals 0, because then build_skb() will use kzsize(data) to figure out the actual data size. Update the comment to reflect that special condition. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-25net: Make devnet_rename_seq staticThomas Gleixner1-1/+1
No users outside net/core/dev.c. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-25net: add sk_stream_is_writeable() helperEric Dumazet1-1/+1
Several call sites use the hardcoded following condition : sk_stream_wspace(sk) >= sk_stream_min_wspace(sk) Lets use a helper because TCP_NOTSENT_LOWAT support will change this condition for TCP sockets. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-23net: Provide a generic socket error queue delivery method for Tx time stamps.Richard Cochran1-0/+47
This patch moves the private error queue delivery function from the af_packet code to the core socket method. In this way, network layers only needing the error queue for transmit time stamping can share common code. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2-12/+29
Pull networking fixes from David Miller: "A couple interesting SKB fragment handling fixes, plus the usual small bits here and there: 1) Fix 64-bit divide build failure on 32-bit platforms in mlx5, from Tim Gardner. 2) Get rid of a stupid reimplementation on "%*phC" in our sysfs MAC address printing helper. 3) Fix NETIF_F_SG capability advertisement in hyperv driver, if the device can't do checksumming offloads then it shouldn't say it can do SG either. From Haiyang Zhang. 4) bgmac needs to depend on PHYLIB, from Hauke Mehrtens. 5) Don't leak DMA mappings on mapping failures, from Neil Horman. 6) We need to reset the transport header of SKBs in ipv4 before we attempt to perform early socket demux, just like ipv6 does. From Eric Dumazet. 7) Add missing locking on vxlan device removal, from Stephen Hemminger. 8) xen-netfront has to make two passes over an SKB to prepare it for transfer. One pass calculates the number of slots needed, the second massages the SKB and fills the slots. Unfortunately, the first pass doesn't calculate the number of slots properly so we can end up trying to build a MAX_SKB_FRAGS + 1 SKB which doesn't work out so well. Fix from Jan Beulich with help and discussion with several others. 9) Fix a similar problem in tun and macvtap, which have to split up scatter-gather elements at PAGE_SIZE boundaries. Don't do zerocopy if it would result in a > MAX_SKB_FRAGS skb. Fixes from Jason Wang. 10) On receive, once we've decoded the VLAN state completely, clear skb->vlan_tci. Otherwise demuxed tunnels underneath can trigger the VLAN code again, corrupting the packet. Fix from Eric Dumazet" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: vlan: fix a race in egress prio management vlan: mask vlan prio bits macvtap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS tuntap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS pkt_sched: sch_qfq: remove a source of high packet delay/jitter xen-netfront: pull on receive skb may need to happen earlier vxlan: add necessary locking on device removal hyperv: Fix the NETIF_F_SG flag setting in netvsc net: Fix sysfs_format_mac() code duplication. be2net: Fix to avoid hardware workaround when not needed macvtap: do not assume 802.1Q when send vlan packets macvtap: fix the missing ret value of TUNSETQUEUE ipv4: set transport header earlier mlx5 core: Fix __udivdi3 when compiling for 32 bit arches bgmac: add dependency to phylib net/irda: fixed style issues in irlan_eth ethtool: fixed trailing statements in ethtool ndisc: bool initializations should use true and false atl1e: unmap partially mapped skb on dma error and free skb
2013-07-19vlan: mask vlan prio bitsEric Dumazet1-2/+9
In commit 48cc32d38a52d0b68f91a171a8d00531edc6a46e ("vlan: don't deliver frames for unknown vlans to protocols") Florian made sure we set pkt_type to PACKET_OTHERHOST if the vlan id is set and we could find a vlan device for this particular id. But we also have a problem if prio bits are set. Steinar reported an issue on a router receiving IPv6 frames with a vlan tag of 4000 (id 0, prio 2), and tunneled into a sit device, because skb->vlan_tci is set. Forwarded frame is completely corrupted : We can see (8100:4000) being inserted in the middle of IPv6 source address : 16:48:00.780413 IP6 2001:16d8:8100:4000:ee1c:0:9d9:bc87 > 9f94:4d95:2001:67c:29f4::: ICMP6, unknown icmp6 type (0), length 64 0x0000: 0000 0029 8000 c7c3 7103 0001 a0ae e651 0x0010: 0000 0000 ccce 0b00 0000 0000 1011 1213 0x0020: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 0x0030: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 It seems we are not really ready to properly cope with this right now. We can probably do better in future kernels : vlan_get_ingress_priority() should be a netdev property instead of a per vlan_dev one. For stable kernels, lets clear vlan_tci to fix the bugs. Reported-by: Steinar H. Gunderson <sesse@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-16ethtool: fixed trailing statements in ethtoolDragos Foianu1-10/+20
Applied fixes suggested by checkpatch.pl. Signed-off-by: Dragos Foianu <dragos.foianu@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-15net: delete __cpuinit usage from all net filesPaul Gortmaker1-2/+2
The __cpuinit type of throwaway sections might have made sense some time ago when RAM was more constrained, but now the savings do not offset the cost and complications. For example, the fix in commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time") is a good example of the nasty type of bugs that can be created with improper use of the various __init prefixes. After a discussion on LKML[1] it was decided that cpuinit should go the way of devinit and be phased out. Once all the users are gone, we can then finally remove the macros themselves from linux/init.h. This removes all the net/* uses of the __cpuinit macros from all C files. [1] https://lkml.org/lkml/2013/5/20/589 Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2013-07-13net: access page->private by using page_privateSunghan Suh1-3/+3
Signed-off-by: Sunghan Suh <sunghan.suh@samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-11gso: Update tunnel segmentation to support Tx checksum offloadAlexander Duyck1-8/+6
This change makes it so that the GRE and VXLAN tunnels can make use of Tx checksum offload support provided by some drivers via the hw_enc_features. Without this fix enabling GSO means sacrificing Tx checksum offload and this actually leads to a performance regression as shown below: Utilization Send Throughput local GSO 10^6bits/s % S state 6276.51 8.39 enabled 7123.52 8.42 disabled To resolve this it was necessary to address two items. First netif_skb_features needed to be updated so that it would correctly handle the Trans Ether Bridging protocol without impacting the need to check for Q-in-Q tagging. To do this it was necessary to update harmonize_features so that it used skb_network_protocol instead of just using the outer protocol. Second it was necessary to update the GRE and UDP tunnel segmentation offloads so that they would reset the encapsulation bit and inner header offsets after the offload was complete. As a result of this change I have seen the following results on a interface with Tx checksum enabled for encapsulated frames: Utilization Send Throughput local GSO 10^6bits/s % S state 7123.52 8.42 disabled 8321.75 5.43 enabled v2: Instead of replacing refrence to skb->protocol with skb_network_protocol just replace the protocol reference in harmonize_features to allow for double VLAN tag checks. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-11net: rename busy poll socket op and globalsEliezer Tamir2-7/+7
Rename LL_SO to BUSY_POLL_SO Rename sysctl_net_ll_{read,poll} to sysctl_busy_{read,poll} Fix up users of these variables. Fix documentation for sysctl. a patch for the socket.7 man page will follow separately, because of limitations of my mail setup. Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-11net: rename include/net/ll_poll.h to include/net/busy_poll.hEliezer Tamir3-3/+3
Rename the file and correct all the places where it is included. Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds18-173/+556
Pull networking updates from David Miller: "This is a re-do of the net-next pull request for the current merge window. The only difference from the one I made the other day is that this has Eliezer's interface renames and the timeout handling changes made based upon your feedback, as well as a few bug fixes that have trickeled in. Highlights: 1) Low latency device polling, eliminating the cost of interrupt handling and context switches. Allows direct polling of a network device from socket operations, such as recvmsg() and poll(). Currently ixgbe, mlx4, and bnx2x support this feature. Full high level description, performance numbers, and design in commit 0a4db187a999 ("Merge branch 'll_poll'") From Eliezer Tamir. 2) With the routing cache removed, ip_check_mc_rcu() gets exercised more than ever before in the case where we have lots of multicast addresses. Use a hash table instead of a simple linked list, from Eric Dumazet. 3) Add driver for Atheros CQA98xx 802.11ac wireless devices, from Bartosz Markowski, Janusz Dziedzic, Kalle Valo, Marek Kwaczynski, Marek Puzyniak, Michal Kazior, and Sujith Manoharan. 4) Support reporting the TUN device persist flag to userspace, from Pavel Emelyanov. 5) Allow controlling network device VF link state using netlink, from Rony Efraim. 6) Support GRE tunneling in openvswitch, from Pravin B Shelar. 7) Adjust SOCK_MIN_RCVBUF and SOCK_MIN_SNDBUF for modern times, from Daniel Borkmann and Eric Dumazet. 8) Allow controlling of TCP quickack behavior on a per-route basis, from Cong Wang. 9) Several bug fixes and improvements to vxlan from Stephen Hemminger, Pravin B Shelar, and Mike Rapoport. In particular, support receiving on multiple UDP ports. 10) Major cleanups, particular in the area of debugging and cookie lifetime handline, to the SCTP protocol code. From Daniel Borkmann. 11) Allow packets to cross network namespaces when traversing tunnel devices. From Nicolas Dichtel. 12) Allow monitoring netlink traffic via AF_PACKET sockets, in a manner akin to how we monitor real network traffic via ptype_all. From Daniel Borkmann. 13) Several bug fixes and improvements for the new alx device driver, from Johannes Berg. 14) Fix scalability issues in the netem packet scheduler's time queue, by using an rbtree. From Eric Dumazet. 15) Several bug fixes in TCP loss recovery handling, from Yuchung Cheng. 16) Add support for GSO segmentation of MPLS packets, from Simon Horman. 17) Make network notifiers have a real data type for the opaque pointer that's passed into them. Use this to properly handle network device flag changes in arp_netdev_event(). From Jiri Pirko and Timo Teräs. 18) Convert several drivers over to module_pci_driver(), from Peter Huewe. 19) tcp_fixup_rcvbuf() can loop 500 times over loopback, just use a O(1) calculation instead. From Eric Dumazet. 20) Support setting of explicit tunnel peer addresses in ipv6, just like ipv4. From Nicolas Dichtel. 21) Protect x86 BPF JIT against spraying attacks, from Eric Dumazet. 22) Prevent a single high rate flow from overruning an individual cpu during RX packet processing via selective flow shedding. From Willem de Bruijn. 23) Don't use spinlocks in TCP md5 signing fast paths, from Eric Dumazet. 24) Don't just drop GSO packets which are above the TBF scheduler's burst limit, chop them up so they are in-bounds instead. Also from Eric Dumazet. 25) VLAN offloads are missed when configured on top of a bridge, fix from Vlad Yasevich. 26) Support IPV6 in ping sockets. From Lorenzo Colitti. 27) Receive flow steering targets should be updated at poll() time too, from David Majnemer. 28) Fix several corner case regressions in PMTU/redirect handling due to the routing cache removal, from Timo Teräs. 29) We have to be mindful of ipv4 mapped ipv6 sockets in upd_v6_push_pending_frames(). From Hannes Frederic Sowa. 30) Fix L2TP sequence number handling bugs, from James Chapman." * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1214 commits) drivers/net: caif: fix wrong rtnl_is_locked() usage drivers/net: enic: release rtnl_lock on error-path vhost-net: fix use-after-free in vhost_net_flush net: mv643xx_eth: do not use port number as platform device id net: sctp: confirm route during forward progress virtio_net: fix race in RX VQ processing virtio: support unlocked queue poll net/cadence/macb: fix bug/typo in extracting gem_irq_read_clear bit Documentation: Fix references to defunct linux-net@vger.kernel.org net/fs: change busy poll time accounting net: rename low latency sockets functions to busy poll bridge: fix some kernel warning in multicast timer sfc: Fix memory leak when discarding scattered packets sit: fix tunnel update via netlink dt:net:stmmac: Add dt specific phy reset callback support. dt:net:stmmac: Add support to dwmac version 3.610 and 3.710 dt:net:stmmac: Allocate platform data only if its NULL. net:stmmac: fix memleak in the open method ipv6: rt6_check_neigh should successfully verify neigh if no NUD information are available net: ipv6: fix wrong ping_v6_sendmsg return value ...
2013-07-09net: rename low latency sockets functions to busy pollEliezer Tamir1-1/+2
Rename functions in include/net/ll_poll.h to busy wait. Clarify documentation about expected power use increase. Rename POLL_LL to POLL_BUSY_LOOP. Add need_resched() testing to poll/select busy loops. Note, that in select and poll can_busy_poll is dynamic and is updated continuously to reflect the existence of supported sockets with valid queue information. Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-04core: Copy inner_protocol in copy_skb_header()Joe Stringer1-0/+1
inner_protocol was added to struct sk_buff in 0d89d2035fe063461a5ddb609b2c12e7fb006e44 ("MPLS: Add limited GSO support"), which is scheduled to be included in v3.11. That patch did not update __copy_skb_header to copy the inner_protocol. Signed-off-by: Joe Stringer <joe@wand.net.nz> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller4-38/+52
Conflicts: drivers/net/ethernet/freescale/fec_main.c drivers/net/ethernet/renesas/sh_eth.c net/ipv4/gre.c The GRE conflict is between a bug fix (kfree_skb --> kfree_skb_list) and the splitting of the gre.c code into seperate files. The FEC conflict was two sets of changes adding ethtool support code in an "!CONFIG_M5272" CPP protected block. Finally the sh_eth.c conflict was between one commit add bits set in the .eesr_err_check mask whilst another commit removed the .tx_error_check member and assignments. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-03core/dev: set pkt_type after eth_type_trans() in dev_forward_skb()Isaku Yamahata1-0/+6
The dev_forward_skb() assignment of pkt_type should be done after the call to eth_type_trans(). ip-encapsulated packets can be handled by localhost. But skb->pkt_type can be PACKET_OTHERHOST when packet comes via veth into ip tunnel device. In that case, the packet is dropped by ip_rcv(). Although this example uses gretap. l2tp-eth also has same issue. For l2tp-eth case, add dummy device for ip address and ip l2tp command. netns A | root netns | netns B veth<->veth=bridge=gretap <-loop back-> gretap=bridge=veth<->veth arp packet -> pkt_type BROADCAST------------>ip_rcv()------------------------> <- arp reply pkt_type ip_rcv()<-----------------OTHERHOST drop sample operations ip link add tapa type gretap remote 172.17.107.4 local 172.17.107.3 ip link add tapb type gretap remote 172.17.107.3 local 172.17.107.4 ip link set tapa up ip link set tapb up ip address add 172.17.107.3 dev tapa ip address add 172.17.107.4 dev tapb ip route get 172.17.107.3 > local 172.17.107.3 dev lo src 172.17.107.3 > cache <local> ip route get 172.17.107.4 > local 172.17.107.4 dev lo src 172.17.107.4 > cache <local> ip link add vetha type veth peer name vetha-peer ip link add vethb type veth peer name vethb-peer brctl addbr bra brctl addbr brb brctl addif bra tapa brctl addif bra vetha-peer brctl addif brb tapb brctl addif brb vethb-peer brctl show > bridge name bridge id STP enabled interfaces > bra 8000.6ea21e758ff1 no tapa > vetha-peer > brb 8000.420020eb92d5 no tapb > vethb-peer ip link set vetha-peer up ip link set vethb-peer up ip link set bra up ip link set brb up ip netns add a ip netns add b ip link set vetha netns a ip link set vethb netns b ip netns exec a ip address add 10.0.0.3/24 dev vetha ip netns exec b ip address add 10.0.0.4/24 dev vethb ip netns exec a ip link set vetha up ip netns exec b ip link set vethb up ip netns exec a arping -I vetha 10.0.0.4 ARPING 10.0.0.4 from 10.0.0.3 vetha ^CSent 2 probes (2 broadcast(s)) Received 0 response(s) Cc: Jason Wang <jasowang@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Patrick McHardy <kaber@trash.net> Cc: Hong Zhiguo <honkiko@gmail.com> Cc: Rami Rosen <ramirose@gmail.com> Cc: Tom Parkin <tparkin@katalix.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Pravin B Shelar <pshelar@nicira.com> Cc: Jesse Gross <jesse@nicira.com> Cc: dev@openvswitch.org Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-02Merge tag 'char-misc-3.11-rc1' of ↵Linus Torvalds2-22/+1
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc updates from Greg KH: "Here's the big char/misc driver tree merge for 3.11-rc1 A variety of different driver patches here. All of these have been in linux-next for a while, and the networking patches were acked-by David Miller, as it made sense for those patches to come through this tree" * tag 'char-misc-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (102 commits) Revert "char: misc: assign file->private_data in all cases" drivers: uio_pdrv_genirq: Use of_match_ptr() macro mei: check whether hw start has succeeded mei: check if the hardware reset succeeded mei: mei_cl_connect: don't multiply the timeout twice mei: do not override a client writing state when buffering mei: move mei_cl_irq_write_complete to client.c UIO: Fix concurrency issue drivers: uio_dmem_genirq: Use of_match_ptr() macro char: misc: assign file->private_data in all cases drivers: hv: allocate synic structures before hv_synic_init() drivers: hv: check interrupt mask before read_index vme: vme_tsi148.c: fix error return code in tsi148_probe() FMC: fix error handling in probe() function fmc: avoid readl/writel namespace conflict FMC: NULL dereference on allocation failure UIO: fix uio_pdrv_genirq with device tree but no interrupt UIO: allow binding uio_pdrv_genirq.c to devices using command line option FMC: add a char-device mezzanine driver FMC: add a driver to write mezzanine EEPROM ...
2013-07-02ethtool: make .get_dump_data() harder to misuse by driversMichal Schmidt1-1/+20
As the patch "bnx2x: remove zeroing of dump data buffer" showed, it is too easy implement .get_dump_data incorrectly in a driver. Let's make sure drivers cannot get confused by userspace requesting a too big dump. Also WARN if the driver sets dump->len to something weird and make sure the length reported to userspace is the actual length of data copied to userspace. Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-02Merge branch 'master' of ↵David S. Miller1-8/+0
git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/vxlan-next Stephen Hemminger says: ==================== Here is current updates for vxlan in net-next. It includes Mike's changes to handle multiple destinations and lots of little cosmetic stuff. This is a fresh vxlan-next repository which was forked from net-next. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-02neighbour: fix a race in neigh_destroy()Eric Dumazet1-5/+7
There is a race in neighbour code, because neigh_destroy() uses skb_queue_purge(&neigh->arp_queue) without holding neighbour lock, while other parts of the code assume neighbour rwlock is what protects arp_queue Convert all skb_queue_purge() calls to the __skb_queue_purge() variant Use __skb_queue_head_init() instead of skb_queue_head_init() to make clear we do not use arp_queue.lock And hold neigh->lock in neigh_destroy() to close the race. Reported-by: Joe Jin <joe.jin@oracle.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-28dev: introduce skb_scrub_packet()Nicolas Dichtel2-10/+24
The goal of this new function is to perform all needed cleanup before sending an skb into another netns. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-27net: fix kernel deadlock with interface rename and netdev name retrieval.Nicolas Schichan3-30/+40
When the kernel (compiled with CONFIG_PREEMPT=n) is performing the rename of a network interface, it can end up waiting for a workqueue to complete. If userland is able to invoke a SIOCGIFNAME ioctl or a SO_BINDTODEVICE getsockopt in between, the kernel will deadlock due to the fact that read_secklock_begin() will spin forever waiting for the writer process (the one doing the interface rename) to update the devnet_rename_seq sequence. This patch fixes the problem by adding a helper (netdev_get_name()) and using it in the code handling the SIOCGIFNAME ioctl and SO_BINDTODEVICE setsockopt. The netdev_get_name() helper uses raw_seqcount_begin() to avoid spinning forever, waiting for devnet_rename_seq->sequence to become even. cond_resched() is used in the contended case, before retrying the access to give the writer process a chance to finish. The use of raw_seqcount_begin() will incur some unneeded work in the reader process in the contended case, but this is better than deadlocking the system. Signed-off-by: Nicolas Schichan <nschichan@freebox.fr> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>