summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2011-08-25Scm: Remove unnecessary pid & credential references in Unix socket's send ↵Tim Chen1-16/+29
and receive path Patch series 109f6e39..7361c36c back in 2.6.36 added functionality to allow credentials to work across pid namespaces for packets sent via UNIX sockets. However, the atomic reference counts on pid and credentials caused plenty of cache bouncing when there are numerous threads of the same pid sharing a UNIX socket. This patch mitigates the problem by eliminating extraneous reference counts on pid and credentials on both send and receive path of UNIX sockets. I found a 2x improvement in hackbench's threaded case. On the receive path in unix_dgram_recvmsg, currently there is an increment of reference count on pid and credentials in scm_set_cred. Then there are two decrement of the reference counts. Once in scm_recv and once when skb_free_datagram call skb->destructor function unix_destruct_scm. One pair of increment and decrement of ref count on pid and credentials can be eliminated from the receive path. Until we destroy the skb, we already set a reference when we created the skb on the send side. On the send path, there are two increments of ref count on pid and credentials, once in scm_send and once in unix_scm_to_skb. Then there is a decrement of the reference counts in scm_destroy's call to scm_destroy_cred at the end of unix_dgram_sendmsg functions. One pair of increment and decrement of the reference counts can be removed so we only need to increment the ref counts once. By incorporating these changes, for hackbench running on a 4 socket NHM-EX machine with 40 cores, the execution of hackbench on 50 groups of 20 threads sped up by factor of 2. Hackbench command used for testing: ./hackbench 50 thread 2000 Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-25sctp: Bundle HEAERTBEAT into ASCONF_ACKMichio Honda3-0/+7
With this patch a HEARTBEAT chunk is bundled into the ASCONF-ACK for ADD IP ADDRESS, confirming the new destination as quickly as possible. Signed-off-by: Michio Honda <micchie@sfc.wide.ad.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-25sctp: HEARTBEAT negotiation after ASCONFMichio Honda1-0/+4
This patch fixes BUG that the ASCONF receiver transmits DATA chunks to the newly added UNCONFIRMED destination. Signed-off-by: Michio Honda <micchie@sfc.wide.ad.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-25Proportional Rate Reduction for TCP.Nandita Dukkipati2-7/+58
This patch implements Proportional Rate Reduction (PRR) for TCP. PRR is an algorithm that determines TCP's sending rate in fast recovery. PRR avoids excessive window reductions and aims for the actual congestion window size at the end of recovery to be as close as possible to the window determined by the congestion control algorithm. PRR also improves accuracy of the amount of data sent during loss recovery. The patch implements the recommended flavor of PRR called PRR-SSRB (Proportional rate reduction with slow start reduction bound) and replaces the existing rate halving algorithm. PRR improves upon the existing Linux fast recovery under a number of conditions including: 1) burst losses where the losses implicitly reduce the amount of outstanding data (pipe) below the ssthresh value selected by the congestion control algorithm and, 2) losses near the end of short flows where application runs out of data to send. As an example, with the existing rate halving implementation a single loss event can cause a connection carrying short Web transactions to go into the slow start mode after the recovery. This is because during recovery Linux pulls the congestion window down to packets_in_flight+1 on every ACK. A short Web response often runs out of new data to send and its pipe reduces to zero by the end of recovery when all its packets are drained from the network. Subsequent HTTP responses using the same connection will have to slow start to raise cwnd to ssthresh. PRR on the other hand aims for the cwnd to be as close as possible to ssthresh by the end of recovery. A description of PRR and a discussion of its performance can be found at the following links: - IETF Draft: http://tools.ietf.org/html/draft-mathis-tcpm-proportional-rate-reduction-01 - IETF Slides: http://www.ietf.org/proceedings/80/slides/tcpm-6.pdf http://tools.ietf.org/agenda/81/slides/tcpm-2.pdf - Paper to appear in Internet Measurements Conference (IMC) 2011: Improving TCP Loss Recovery Nandita Dukkipati, Matt Mathis, Yuchung Cheng Signed-off-by: Nandita Dukkipati <nanditad@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-25af-packet: TPACKET_V3 flexible buffer implementation.chetan loke1-46/+891
1) Blocks can be configured with non-static frame-size. 2) Read/poll is at a block-level(as opposed to packet-level). 3) Added poll timeout to avoid indefinite user-space wait on idle links. 4) Added user-configurable knobs: 4.1) block::timeout. 4.2) tpkt_hdr::sk_rxhash. Changes: C1) tpacket_rcv() C1.1) packet_current_frame() is replaced by packet_current_rx_frame() The bulk of the processing is then moved in the following chain: packet_current_rx_frame() __packet_lookup_frame_in_block fill_curr_block() or retire_current_block dispatch_next_block or return NULL(queue is plugged/paused) Signed-off-by: Chetan Loke <loke.chetan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-25IEEE802.15.4: 6LoWPAN basic supportAlexander Smirnov4-3/+1108
This patch provides base support for transmission of IPv6 packets as well as the formation of IPv6 link-local addresses and statelessly autoconfigured addresses on top of IEEE 802.15.4 networks. For more information please look at the RFC4944 "Compression Format for IPv6 Datagrams in Low Power and Losst Networks (6LoWPAN). Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-25net: xfrm: convert to SKB frag APIsIan Campbell1-4/+7
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-25net: ipv6: convert to SKB frag APIsIan Campbell1-3/+4
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: "Pekka Savola (ipv6)" <pekkas@netcore.fi> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-25net: ipv4: convert to SKB frag APIsIan Campbell4-6/+8
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: "Pekka Savola (ipv6)" <pekkas@netcore.fi> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-25net: convert core to skb paged frag APIsIan Campbell7-36/+36
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: "Michał Mirosław" <mirq-linux@rere.qmqm.pl> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-25rps: support IPIP encapsulationEric Dumazet1-0/+2
Skip IPIP header to get proper layer-4 information. Like GRE tunnels, this only works if rxhash is not already provided by the device itself (ethtool -K ethX rxhash off), to allow kernel compute a software rxhash. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-24Merge branch 'batman-adv/next' of git://git.open-mesh.org/linux-mergeDavid S. Miller20-148/+283
2011-08-23Merge branch 'for-davem' of ↵David S. Miller19-417/+375
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
2011-08-22net: vlan: goto another_round instead of calling __netif_receive_skbJiri Pirko1-4/+3
Now, when vlan tag on untagged in non-accelerated path is stripped from skb, headers are reset right away. Benefit from that and avoid calling __netif_receive_skb recursivelly and just use another_round. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-22Merge branch 'master' of ↵John W. Linville19-417/+375
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem Conflicts: drivers/staging/ath6kl/miscdrv/ar3kps/ar3kpsparser.c drivers/staging/ath6kl/os/linux/ar6000_drv.c
2011-08-22batman-adv: merge update_transtable() into tt related codeMarek Lindner3-74/+70
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2011-08-22batman-adv: reuse tt_len() to calculate tt buffer lengthMarek Lindner1-2/+1
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Acked-by: Antonio Quartulli <ordex@autistici.org>
2011-08-22batman-adv: print client flags in the local/global transtables outputAntonio Quartulli1-10/+27
Since clients can have several flags on or off, this patches make them appear in the local/global transtable output so that they can be checked for debugging purposes. Signed-off-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2011-08-22batman-adv: implement AP-isolation on the sender sideAntonio Quartulli5-12/+29
If a node has to send a packet issued by a WIFI client to another WIFI client, the packet is dropped. Signed-off-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2011-08-22batman-adv: implement AP-isolation on the receiver sideAntonio Quartulli5-0/+50
When a node receives a unicast packet it checks if the source and the destination client can communicate or not due to the AP isolation Signed-off-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2011-08-22batman-adv: detect clients connected through a 802.11 deviceAntonio Quartulli9-11/+55
Clients connected through a 802.11 device are now marked with the TT_CLIENT_WIFI flag. This flag is also advertised with the tt announcement. Signed-off-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2011-08-22batman-adv: correct several typ0s in the commentsAntonio Quartulli12-37/+36
Several typos have been corrected and some sentences have been rephrased Signed-off-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2011-08-22batman-adv: hash_add() has to discriminate on the return valueAntonio Quartulli3-9/+22
hash_add() returns 0 on success while returns -1 either on error and on entry already present. The caller could use such information to select its behaviour. For this reason it is useful that hash_add() returns -1 in case on error and returns 1 in case of entry already present. Signed-off-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2011-08-21net: Preserve ooo_okay when copying skb headerChangli Gao1-0/+1
Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-20Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/netDavid S. Miller13-26/+43
2011-08-19ipv6: Fix ipv6_getsockopt for IPV6_2292PKTOPTIONSDaniel Baluta1-4/+5
IPV6_2292PKTOPTIONS is broken for 32-bit applications running in COMPAT mode on 64-bit kernels. The same problem was fixed for IPv4 with the patch: ipv4: Fix ip_getsockopt for IP_PKTOPTIONS, commit dd23198e58cd35259dd09e8892bbdb90f1d57748 Signed-off-by: Sorin Dumitru <sdumitru@ixiacom.com> Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-19net: rps: support PPPOE session messagesChangli Gao1-0/+8
Inspect the payload of PPPOE session messages for the 4 tuples to generate skb->rxhash. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-19net: rps: support 802.1QChangli Gao1-0/+8
For the 802.1Q packets, if the NIC doesn't support hw-accel-vlan-rx, RPS won't inspect the internal 4 tuples to generate skb->rxhash, so this kind of traffic can't get any benefit from RPS. This patch adds the support for 802.1Q to RPS. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-19vlan: reset headers on accel emulation pathJiri Pirko1-0/+2
It's after all necessary to do reset headers here. The reason is we cannot depend that it gets reseted in __netif_receive_skb once skb is reinjected. For incoming vlanids without vlan_dev, vlan_do_receive() returns false with skb != NULL and __netif_reveive_skb continues, skb is not reinjected. This might be good material for 3.0-stable as well Reported-by: Mike Auty <mike.auty@gmail.com> Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-18net: remove ndo_set_multicast_list callbackJiri Pirko3-7/+5
Remove no longer used operation. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-18net: remove use of ndo_set_multicast_list in driversJiri Pirko7-10/+6
replace it by ndo_set_rx_mode Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-18net: introduce IFF_UNICAST_FLT private flagJiri Pirko1-6/+6
Use IFF_UNICAST_FTL to find out if driver handles unicast address filtering. In case it does not, promisc mode is entered. Patch also fixes following drivers: stmmac, niu: support uc filtering and yet it propagated ndo_set_multicast_list bna, benet, pxa168_eth, ks8851, ks8851_mll, ksz884x : has set ndo_set_rx_mode but do not support uc filtering Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-18net_sched: fix port mirror/redirect stats reportingJamal Hadi Salim1-2/+1
When a redirected or mirrored packet is dropped by the target device we need to record statistics. Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-18rps: Inspect GRE encapsulated packets to get flow hashTom Herbert1-0/+22
Crack open GRE packets in __skb_get_rxhash to compute 4-tuple hash on in encapsulated packet. Note that this is used only when the __skb_get_rxhash is taken, in particular only when the device does not compute provide the rxhash (ie. feature is disabled). This was tested by creating a single GRE tunnel between two 16 core AMD machines. 200 netperf TCP_RR streams were ran with 1 byte request and response size. Without patch: 157497 tps, 50/90/99% latencies 1250/1292/1364 usecs With patch: 325896 tps, 50/90/99% latencies 603/848/1169 Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-18rps: Infrastructure in __skb_get_rxhash for deep inspectionTom Herbert1-0/+7
Basics for looking for ports in encapsulated packets in tunnels. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-18rps: Add flag to skb to indicate rxhash is based on L4 tupleTom Herbert6-13/+16
The l4_rxhash flag was added to the skb structure to indicate that the rxhash value was computed over the 4 tuple for the packet which includes the port information in the encapsulated transport packet. This is used by the stack to preserve the rxhash value in __skb_rx_tunnel. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-18rps: Some minor cleanup in get_rps_cpusTom Herbert1-5/+7
Use some variables for clarity and extensibility. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-17sit tunnels: propagate IPv6 transport class to IPv4 Type of ServiceLionel Elie Mamane1-0/+3
sit tunnels (IPv6 tunnel over IPv4) do not implement the "tos inherit" case to copy the IPv6 transport class byte from the inner packet to the IPv4 type of service byte in the outer packet. By contrast, ipip tunnels and GRE tunnels do. This patch, adapted from the similar code in net/ipv4/ipip.c and net/ipv4/ip_gre.c, implements that. This patch applies to 3.0.1, and has been tested on that version. Signed-off-by: Lionel Elie Mamane <lionel@mamane.lu> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-13af_iucv: add HiperSockets transportUrsula Braun1-72/+677
The current transport mechanism for af_iucv is the z/VM offered communications facility IUCV. To provide equivalent support when running Linux in an LPAR, HiperSockets transport is added to the AF_IUCV address family. It requires explicit binding of an AF_IUCV socket to a HiperSockets device. A new packet_type ETH_P_AF_IUCV is announced. An af_iucv specific transport header is defined preceding the skb data. A small protocol is implemented for connecting and for flow control/congestion management. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-13af_iucv: cleanup - use iucv_sk(sk) earlyUrsula Braun1-21/+23
Code cleanup making make use of local variable for struct iucv_sock. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-13af_iucv: use loadable iucv interfaceFrank Blaschka1-45/+74
For future af_iucv extensions the module should be able to run in LPAR mode too. For this we use the new dynamic loading iucv interface. Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-13iucv: kernel option for z/VM IUCV and HiperSocketsUrsula Braun1-6/+8
When adding HiperSockets transport to AF_IUCV Sockets, af_iucv either depends on IUCV or QETH_L3 (or both). This patch introduces the necessary changes for kernel configuration. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-13iucv: introduce loadable iucv interfaceFrank Blaschka1-0/+23
This patch adds a symbol to dynamically load iucv functions. Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-12nl80211/cfg80211: Add extra IE configuration to AP mode setupJouni Malinen1-1/+27
The NL80211_CMD_NEW_BEACON command is, in practice, requesting AP mode operations to be started. Add new attributes to provide extra IEs (e.g., WPS IE, P2P IE) for drivers that build Beacon, Probe Response, and (Re)Association Response frames internally (likely in firmware). Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-08-12nl80211/cfg80211: Add crypto settings into NEW_BEACONJouni Malinen1-0/+21
This removes need from drivers to parse the beacon tail/head data to figure out what crypto settings are to be used in AP mode in case the Beacon and Probe Response frames are fully constructed in the driver/firmware. Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-08-12nl80211/cfg80211: Allow SSID to be specified in new beacon commandJouni Malinen1-0/+29
This makes it easier for drivers that generate Beacon and Probe Response frames internally (in firmware most likely) in AP mode. Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-08-12cfg80211/mac80211: move information element parsing logic to cfg80211Yogesh Ashok Powar3-230/+168
Moving the parsing logic for retrieving the information elements stored in management frames, e.g. beacons or probe responses, and making it available to other cfg80211 drivers. Signed-off-by: Yogesh Ashok Powar <yogeshp@marvell.com> Signed-off-by: Amitkumar Karwar <akarwar@marvell.com> Signed-off-by: Bing Zhao <bzhao@marvell.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-08-12net: cleanup some rcu_dereference_rawEric Dumazet10-23/+22
RCU api had been completed and rcu_access_pointer() or rcu_dereference_protected() are better than generic rcu_dereference_raw() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-12neigh: reduce arp latencyEric Dumazet1-14/+26
Remove the artificial HZ latency on arp resolution. Instead of firing a timer in one jiffy (up to 10 ms if HZ=100), lets send the ARP message immediately. Before patch : # arp -d 192.168.20.108 ; ping -c 3 192.168.20.108 PING 192.168.20.108 (192.168.20.108) 56(84) bytes of data. 64 bytes from 192.168.20.108: icmp_seq=1 ttl=64 time=9.91 ms 64 bytes from 192.168.20.108: icmp_seq=2 ttl=64 time=0.065 ms 64 bytes from 192.168.20.108: icmp_seq=3 ttl=64 time=0.061 ms After patch : $ arp -d 192.168.20.108 ; ping -c 3 192.168.20.108 PING 192.168.20.108 (192.168.20.108) 56(84) bytes of data. 64 bytes from 192.168.20.108: icmp_seq=1 ttl=64 time=0.152 ms 64 bytes from 192.168.20.108: icmp_seq=2 ttl=64 time=0.064 ms 64 bytes from 192.168.20.108: icmp_seq=3 ttl=64 time=0.074 ms Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-11nl80211/cfg80211: Make addition of new sinfo fields saferJouni Malinen2-0/+2
Add a comment pointing out the use of enum station_info_flags for all new struct station_info fields. In addition, memset the sinfo buffer to zero before use on all paths in the current tree to avoid leaving uninitialized pointers in the data. Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>