summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2009-11-04net: Introduce for_each_netdev_rcu() iteratorEric Dumazet10-77/+66
Adds RCU management to the list of netdevices. Convert some for_each_netdev() users to RCU version, if it can avoid read_lock-ing dev_base_lock Ie: read_lock(&dev_base_loack); for_each_netdev(net, dev) some_action(); read_unlock(&dev_base_lock); becomes : rcu_read_lock(); for_each_netdev_rcu(net, dev) some_action(); rcu_read_unlock(); Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-04em_meta: avoid one dev_put()Eric Dumazet1-6/+6
Another rcu conversion to avoid one dev_hold()/dev_put() pair Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-04Phonet: remove tautologiesRémi Denis-Courmont1-4/+2
These checks don't make sense anymore since rtnl_notify() cannot fail. Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-02ipv6: no more dev_put() in datagram_send_ctl()Eric Dumazet1-5/+9
Avoids touching device refcount in datagram_send_ctl(), thanks to RCU Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-02ipv6: no more dev_put() in inet6_bind()Eric Dumazet1-8/+9
Avoids touching device refcount in inet6_bind(), thanks to RCU Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-02ip6tnl: less dev_put() callsEric Dumazet1-6/+5
Using dev_get_by_index_rcu() in ip6_tnl_rcv_ctl() & ip6_tnl_xmit_ctl() avoids touching device refcount. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-02packet: less dev_put() callsEric Dumazet1-10/+12
- packet_sendmsg_spkt() can use dev_get_by_name_rcu() to avoid touching device refcount. - packet_getname_spkt() & packet_getname() can use dev_get_by_index_rcu() to avoid touching device refcount too. tpacket_snd() & packet_snd() can not use RCU yet because they can sleep when allocating skb. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-02net: RCU locking for simple ioctl()Eric Dumazet1-4/+4
All ioctls() implemented by dev_ifsioc_locked() : SIOCGIFFLAGS, SIOCGIFMETRIC, SIOCGIFMTU, SIOCGIFHWADDR, SIOCGIFSLAVE, SIOCGIFMAP, SIOCGIFINDEX & SIOCGIFTXQLEN can use RCU lock instead of dev_base_lock rwlock Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-02icmp: icmp_send() can avoid a dev_put()Eric Dumazet1-4/+5
We can avoid touching device refcount in icmp_send(), using dev_get_by_index_rcu() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-02ipv4: inetdev_by_index() switch to RCUEric Dumazet1-3/+4
Use dev_get_by_index_rcu() instead of __dev_get_by_index() and dev_base_lock rwlock Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-02veth: Fix unregister_netdevice_queue for vethEric W. Biederman1-1/+2
I tested the recent unregister many changes and got a weird, nasty and seemingly unrelasted kernel oops. Changing unregister_netdevice_queue to use list_move_tail fixes the problem for me. ip link add type veth rmmod veth ls /sys/class/net/ showed one of the veth devices still present. A subsequent ip link oopsed the box. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-02net: Introduce dev_get_by_name_rcu()Eric Dumazet1-9/+40
Some workloads hit dev_base_lock rwlock pretty hard. We can use RCU lookups to avoid touching this rwlock (and avoid touching netdevice refcount) netdevices are already freed after a RCU grace period, so this patch adds no penalty at device dismantle time. However, it adds a synchronize_rcu() call in dev_change_name() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-31RDS/IB+IW: Move recv processing to a taskletAndy Grover6-12/+52
Move receive processing from event handler to a tasklet. This should help prevent hangcheck timer from going off when RDS is under heavy load. Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-31RDS: Do not send congestion updates to loopback connectionsAndy Grover1-0/+2
This issue was discovered by HP's Pradeep and fixed in OFED 1.3, but not fixed in later versions, since the fix's implementation was not immediately applyable to the later code. This patch should do the trick for 1.4+ codebases. Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-31RDS: Fix panic on unloadAndy Grover2-8/+2
Remove explicit destruction of passive connection when destroying active end of the connection. The passive end is also on the device's connection list, and will thus be cleaned up properly. Panic was caused by trying to clean it up twice. Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-31RDS: Fix potential race around rds_i[bw]_allocationAndy Grover2-6/+8
"At rds_ib_recv_refill_one(), it first executes atomic_read(&rds_ib_allocation) for if-condition checking, and then executes atomic_inc(&rds_ib_allocation) if the condition was not satisfied. However, if any other code which updates rds_ib_allocation executes between these two atomic operation executions, it seems that it may result race condition. (especially when rds_ib_allocation + 1 == rds_ib_sysctl_max_recv_allocation)" This patch fixes this by using atomic_inc_unless to eliminate the possibility of allocating more than rds_ib_sysctl_max_recv_allocation and then decrementing the count if the allocation fails. It also makes an identical change to the iwarp transport. Reported-by: Shin Hong <hongshin@gmail.com> Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-31RDS: Add GET_MR_FOR_DEST sockoptAndy Grover3-0/+28
RDS currently supports a GET_MR sockopt to establish a memory region (MR) for a chunk of memory. However, the fastreg method ties a MR to a particular destination. The GET_MR_FOR_DEST sockopt allows the remote machine to be specified, and thus support for fastreg (aka FRWRs). Note that this patch does *not* do all of this - it simply implements the new sockopt in terms of the old one, so applications can begin to use the new sockopt in preparation for cutover to FRWRs. Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-30net: Allow devices to specify a device specific sysfs group.Eric W. Biederman1-1/+4
This isn't beautifully abstracted, but it is simple, simplifies uses and so far is only needed for the bonding driver. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-30net: use hlist_for_each_entry()Eric Dumazet1-8/+8
Small cleanup of __dev_get_by_name() and __dev_get_by_index() to use hlist_for_each_entry() : They'll look like their _rcu variant. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-30vlan: cleanup multiple unregistrationsPatrick McHardy1-32/+20
The temporary copy of the VLAN group is not neccessary since the lower device is already in the process of being unregistered, if it was neccessary the memset of the global group would introduce a race condition. With this removed, the changes to the original code are only a few lines, so remove the new function and move the code back into vlan_device_event(). Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-30ax25: unsigned cannot be less than 0 in ax25_ctl_ioctl()roel kluin1-4/+3
struct ax25_ctl_struct member `arg' is unsigned and cannot be less than 0. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-30gro: Change all receive functions to return GRO result codesBen Hutchings2-30/+24
This will allow drivers to adjust their receive path dynamically based on whether GRO is being applied successfully. Currently all in-tree callers ignore the return values of these functions and do not need to be changed. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-30gro: Name the GRO result enumeration typeBen Hutchings2-7/+17
This clarifies which return and parameter types are GRO result codes and not RX result codes. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-30Merge branch 'master' of ↵David S. Miller9-21/+52
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
2009-10-29net: Fix 'Re: PACKET_TX_RING: packet size is too long'Gabor Gombas1-4/+1
Currently PACKET_TX_RING forces certain amount of every frame to remain unused. This probably originates from an early version of the PACKET_TX_RING patch that in fact used the extra space when the (since removed) CONFIG_PACKET_MMAP_ZERO_COPY option was enabled. The current code does not make any use of this extra space. This patch removes the extra space reservation and lets userspace make use of the full frame size. Signed-off-by: Gabor Gombas <gombasg@sztaki.hu> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29net,socket: introduce DECLARE_SOCKADDR helper to catch overflow at build timeCyrill Gorcunov1-1/+1
proto_ops->getname implies copying protocol specific data into storage unit (particulary to __kernel_sockaddr_storage). So when we implement new protocol support we should keep such a detail in mind (which is easy to forget about). Lets introduce DECLARE_SOCKADDR helper which check if storage unit is not overfowed at build time. Eventually inet_getname is switched to use DECLARE_SOCKADDR (to show example of usage). Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29Merge branch 'master' of ↵David S. Miller6-57/+49
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6
2009-10-29net: Introduce dev_get_by_index_rcu()Eric Dumazet1-10/+38
Some workloads hit dev_base_lock rwlock pretty hard. We can use RCU lookups to avoid touching this rwlock. netdevices are already freed after a RCU grace period, so this patch adds no penalty at device dismantle time. dev_ifname() converted to dev_get_by_index_rcu() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29net: Cleanup redundant tests on unsignedroel kluin3-6/+1
optlen is unsigned so the `< 0' test is never true. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29net: Cleanup redundant tests on unsignedroel kluin2-2/+2
If there is data, the unsigned skb->len is greater than 0. rt.sigdigits is unsigned as well, so the test `>= 0' is always true, the other part of the test catches wrapped values. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29Allow disabling of DSACK TCP option per routeGilad Ben-Yossef1-2/+6
Add and use no DSCAK bit in the features field. Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com> Sigend-off-by: Ori Finkelman <ori@comsleep.com> Sigend-off-by: Yony Amit <yony@comsleep.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29Allow to turn off TCP window scale opt per routeGilad Ben-Yossef2-3/+6
Add and use no window scale bit in the features field. Note that this is not the same as setting a window scale of 0 as would happen with window limit on route. Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com> Sigend-off-by: Ori Finkelman <ori@comsleep.com> Sigend-off-by: Yony Amit <yony@comsleep.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29Allow disabling TCP timestamp options per routeGilad Ben-Yossef2-3/+8
Implement querying and acting upon the no timestamp bit in the feature field. Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com> Sigend-off-by: Ori Finkelman <ori@comsleep.com> Sigend-off-by: Yony Amit <yony@comsleep.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29Add the no SACK route option featureGilad Ben-Yossef2-2/+5
Implement querying and acting upon the no sack bit in the features field. Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com> Sigend-off-by: Ori Finkelman <ori@comsleep.com> Sigend-off-by: Yony Amit <yony@comsleep.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29Allow tcp_parse_options to consult dst entryGilad Ben-Yossef6-41/+54
We need tcp_parse_options to be aware of dst_entry to take into account per dst_entry TCP options settings Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com> Sigend-off-by: Ori Finkelman <ori@comsleep.com> Sigend-off-by: Yony Amit <yony@comsleep.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29Only parse time stamp TCP option in time wait sockGilad Ben-Yossef1-2/+2
Since we only use tcp_parse_options here to check for the exietence of TCP timestamp option in the header, it is better to call with the "established" flag on. Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com> Signed-off-by: Ori Finkelman <ori@comsleep.com> Signed-off-by: Yony Amit <yony@comsleep.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29ip6mr: Optimize multiple unregistrationEric Dumazet1-5/+10
Speedup module unloading by factorizing synchronize_rcu() calls Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29ipv6 sit: Optimize multiple unregistrationEric Dumazet1-6/+11
Speedup module unloading by factorizing synchronize_rcu() calls Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29ipmr: Optimize multiple unregistrationEric Dumazet1-5/+10
Speedup module unloading by factorizing synchronize_rcu() calls Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29ip6tnl: Optimize multiple unregistrationEric Dumazet1-3/+8
Speedup module unloading by factorizing synchronize_rcu() calls Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29bridge: Optimize multiple unregistrationEric Dumazet1-10/+9
Speedup module unloading by factorizing synchronize_rcu() calls Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29AF_RAW: Augment raw_send_hdrinc to expand skb to fit iphdr->ihl (v2)Neil Horman1-7/+17
Augment raw_send_hdrinc to correct for incorrect ip header length values A series of oopses was reported to me recently. Apparently when using AF_RAW sockets to send data to peers that were reachable via ipsec encapsulation, people could panic or BUG halt their systems. I've tracked the problem down to user space sending an invalid ip header over an AF_RAW socket with IP_HDRINCL set to 1. Basically what happens is that userspace sends down an ip frame that includes only the header (no data), but sets the ip header ihl value to a large number, one that is larger than the total amount of data passed to the sendmsg call. In raw_send_hdrincl, we allocate an skb based on the size of the data in the msghdr that was passed in, but assume the data is all valid. Later during ipsec encapsulation, xfrm4_tranport_output moves the entire frame back in the skbuff to provide headroom for the ipsec headers. During this operation, the skb->transport_header is repointed to a spot computed by skb->network_header + the ip header length (ihl). Since so little data was passed in relative to the value of ihl provided by the raw socket, we point transport header to an unknown location, resulting in various crashes. This fix for this is pretty straightforward, simply validate the value of of iph->ihl when sending over a raw socket. If (iph->ihl*4U) > user data buffer size, drop the frame and return -EINVAL. I just confirmed this fixes the reported crashes. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29vlan: Add support to netdev_ops.ndo_fcoe_get_wwn for VLAN deviceYi Zou1-0/+13
Implements the netdev_ops.ndo_fcoe_get_wwn for VLAN device. Signed-off-by: Yi Zou <yi.zou@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-28net: Corrected spelling error heurestics->heuristicsAndreas Petlund1-2/+2
Corrected a spelling error in a function name. Signed-off-by: Andreas Petlund <apetlund@simula.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-28net: sysfs: ethtool_ops can be NULLEric Dumazet1-2/+6
commit d519e17e2d01a0ee9abe083019532061b4438065 (net: export device speed and duplex via sysfs) made the wrong assumption that netdev->ethtool_ops was always set. This makes possible to crash kernel and let rtnl in locked state. modprobe dummy ip link set dummy0 up (udev runs and crash) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-28gre: Optimize multiple unregistrationEric Dumazet1-5/+10
Speedup module unloading by factorizing synchronize_rcu() calls Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-28ipip: Optimize multiple unregistrationEric Dumazet1-6/+11
Speedup module unloading by factorizing synchronize_rcu() calls Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-28vlan: Optimize multiple unregistrationEric Dumazet2-16/+34
Use unregister_netdevice_many() to speedup master device unregister. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-28net: add a list_head parameter to dellink() methodEric Dumazet4-13/+13
Adding a list_head parameter to rtnl_link_ops->dellink() methods allow us to queue devices on a list, in order to dismantle them all at once. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-28net: Introduce unregister_netdevice_many()Eric Dumazet1-32/+65
Introduce rollback_registered_many() and unregister_netdevice_many() rollback_registered_many() is able to perform necessary steps at device dismantle time, factorizing two expensive synchronize_net() calls. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>