summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2010-10-06bonding: add retransmit membership reports tunableFlavio Leitner5-0/+72
Allow sysadmins to configure the number of multicast membership report sent on a link failure event. Signed-off-by: Flavio Leitner <fleitner@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-06bonding: fix to rejoin multicast groups immediatelyFlavio Leitner1-8/+8
The IGMP specs states that if the system receives a membership report, it shouldn't send another for the next minute. However, if a link failure happens right after that, the backup slave and the switch connected to this slave will not know about the multicast and the traffic will hang for about a minute. This patch fixes it to rejoin multicast groups immediately after a failover restoring the multicast traffic. Signed-off-by: Flavio Leitner <fleitner@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-06bonding: rejoin multicast groups on VLANsFlavio Leitner2-11/+50
During a failover, the IGMP membership is sent to update the switch restoring the traffic, but it misses groups added to VLAN devices running on top of bonding devices. This patch changes it to iterate over all VLAN devices on top of it sending IGMP memberships too. Signed-off-by: Flavio Leitner <fleitner@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-06ehea: converting msleeps to waitqueue on check_sqs() functionBreno Leitao2-7/+12
Removing the msleep() call in check_sqs() function, and replacing by a wait queue. Signed-off-by: Breno Leitao <leitao@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-06ehea: using wait queues instead of msleep on ehea_flush_sqBreno Leitao2-7/+13
This patch just remove a msleep loop and change to wait queue, making the code cleaner. Signed-off-by: Breno Leitao <leitao@linux.vnet.ibm.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-06ixgbevf: declare functions as staticEmil Tantilov3-4/+2
Following patch fixes warnings reported by `make namespacecheck` Reported by Stephen Hemminger CC: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Acked-by: Greg Rose <greg.v.rose@intel.com> Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-06AF_UNIX: Implement SO_TIMESTAMP and SO_TIMETAMPNS on Unix socketsAlban Crequy1-0/+5
Userspace applications can already request to receive timestamps with: setsockopt(sockfd, SOL_SOCKET, SO_TIMESTAMP, ...) Although setsockopt() returns zero (success), timestamps are not added to the ancillary data. This patch fixes that on SOCK_DGRAM and SOCK_SEQPACKET Unix sockets. Signed-off-by: Alban Crequy <alban.crequy@collabora.co.uk> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-06net neigh: RCU conversion of neigh hash tableEric Dumazet6-100/+170
David This is the first step for RCU conversion of neigh code. Next patches will convert hash_buckets[] and "struct neighbour" to RCU protected objects. Thanks [PATCH net-next] net neigh: RCU conversion of neigh hash table Instead of storing hash_buckets, hash_mask and hash_rnd in "struct neigh_table", a new structure is defined : struct neigh_hash_table { struct neighbour **hash_buckets; unsigned int hash_mask; __u32 hash_rnd; struct rcu_head rcu; }; And "struct neigh_table" has an RCU protected pointer to such a neigh_hash_table. This means the signature of (*hash)() function changed: We need to add a third parameter with the actual hash_rnd value, since this is not anymore a neigh_table field. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-06net neigh: neigh_delete() and neigh_add() changesEric Dumazet1-22/+17
neigh_delete() and neigh_add() dont need to touch device refcount, we hold RTNL when calling them, so device cannot disappear under us. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-06net: add a core netdev->rx_dropped counterEric Dumazet11-34/+26
In various situations, a device provides a packet to our stack and we drop it before it enters protocol stack : - softnet backlog full (accounted in /proc/net/softnet_stat) - bad vlan tag (not accounted) - unknown/unregistered protocol (not accounted) We can handle a per-device counter of such dropped frames at core level, and automatically adds it to the device provided stats (rx_dropped), so that standard tools can be used (ifconfig, ip link, cat /proc/net/dev) This is a generalization of commit 8990f468a (net: rx_dropped accounting), thus reverting it. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-05ppp: Use a real SKB control block in fragmentation engine.David S. Miller1-21/+25
Do this instead of subverting fields in skb proper. The macros that could very easily match variable or function names were also just asking for trouble. Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-05ipv6: make __ipv6_isatap_ifid staticstephen hemminger2-4/+1
Another exported symbol only used in one file Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-05fib: fib_rules_cleanup can be staticstephen hemminger2-3/+1
fib_rules_cleanup_ups is only defined and used in one place. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-05fib: cleanupsEric Dumazet3-182/+206
Code style cleanups before upcoming functional changes. C99 initializer for fib_props array. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-05wimax: make functions localstephen hemminger6-23/+11
Make wimax variables and functions local if possible. Compile tested only. This also removes a couple of unused EXPORT_SYMBOL. If this breaks some out of tree code, please fix that by putting the code in the kernel tree. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-05qlcnic: remove dead codestephen hemminger2-176/+0
This driver has several pieces of dead code (found by running make namespacecheck). This patch removes them. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-05caif: remove duplicated includeNicolas Kaiser1-1/+0
Remove duplicated include. Signed-off-by: Nicolas Kaiser <nikai@nikai.net> Acked-by: Sjur Braendeland <sjur.brandeland@stericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-05don't let BCM63XX_PHY depend on non-existant symbolUwe Kleine-König1-1/+0
The kernel doesn't have a symbol called BCM63XX. There is a symbol BCM63XX_ENET (introduced in 9b1fc55a0500, 6 weeks after 09bb9aa0ed that introduced BCM63XX_PHY), but the driver compiles without that, too. Cc: Maxime Bizon <mbizon@freebox.fr> Cc: Florian Fainelli <florian@openwrt.org> Cc: David S. Miller <davem@davemloft.net> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-05net/phy: fix many "defined but unused" warningsUwe Kleine-König15-15/+15
MODULE_DEVICE_TABLE only expands to something if it's compiled for a module. So when building-in support for the phys, the mdio_device_id tables are unused. Marking them with __maybe_unused fixes the following warnings: drivers/net/phy/bcm63xx.c:134: warning: 'bcm63xx_tbl' defined but not used drivers/net/phy/broadcom.c:933: warning: 'broadcom_tbl' defined but not used drivers/net/phy/cicada.c:162: warning: 'cicada_tbl' defined but not used drivers/net/phy/davicom.c:222: warning: 'davicom_tbl' defined but not used drivers/net/phy/et1011c.c:114: warning: 'et1011c_tbl' defined but not used drivers/net/phy/icplus.c:137: warning: 'icplus_tbl' defined but not used drivers/net/phy/lxt.c:226: warning: 'lxt_tbl' defined but not used drivers/net/phy/marvell.c:724: warning: 'marvell_tbl' defined but not used drivers/net/phy/micrel.c:234: warning: 'micrel_tbl' defined but not used drivers/net/phy/national.c:154: warning: 'ns_tbl' defined but not used drivers/net/phy/qsemi.c:141: warning: 'qs6612_tbl' defined but not used drivers/net/phy/realtek.c:82: warning: 'realtek_tbl' defined but not used drivers/net/phy/smsc.c:257: warning: 'smsc_tbl' defined but not used drivers/net/phy/ste10Xp.c:135: warning: 'ste10Xp_tbl' defined but not used drivers/net/phy/vitesse.c:195: warning: 'vitesse_tbl' defined but not used Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-05net: relax rtnl_dereference()David S. Miller1-4/+6
rtnl_dereference() is used in contexts where RTNL is held, to fetch an RCU protected pointer. Updates to this pointer are prevented by RTNL, so we dont need smp_read_barrier_depends() and the ACCESS_ONCE() provided in rcu_dereference_check(). rtnl_dereference() is mainly a macro to document the locking invariant. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-05ipvs: Use frag walker helper in SCTP proto support.David S. Miller1-9/+10
Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Simon Horman <horms@verge.net.au>
2010-10-05net: dynamic ingress_queue allocationEric Dumazet5-27/+71
ingress being not used very much, and net_device->ingress_queue being quite a big object (128 or 256 bytes), use a dynamic allocation if needed (tc qdisc add dev eth0 ingress ...) dev_ingress_queue(dev) helper should be used only with RTNL taken. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-05qlcnic: set mtu lower limitSritej Velaga2-3/+4
Setting mtu < 68 is not supported. Signed-off-by: Sritej Velaga <sritej.velaga@qlogic.com> Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-05qlcnic: cleanup port mode settingSritej Velaga1-40/+0
Port mode setting is not required for Qlogic CNA adapters. Signed-off-by: Sritej Velaga <sritej.velaga@qlogic.com> Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-05qlcnic: sparse warning fixesSucheta Chakraborty2-11/+11
Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com> Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-05qlcnic: fix vlan TSO on big endian machineSucheta Chakraborty3-8/+20
o desc->vlan_tci is in __le16 format. Doing htons and cpu_to_le64 again on vlan_tci, result in invalid value on ppc. Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com> Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-05qlcnic: fix endianess for lroSucheta Chakraborty2-3/+10
ipaddress in ifa->ifa_address field are in big endian format. Also device requires ip address in big endian only. Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com> Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-05qlcnic: fix diag registerAmit Kumar Salecha1-3/+3
regs_buff[i] and diag_registers[j] array should use different index variable. Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-05qlcnic: fix eswitch statsAmit Kumar Salecha2-9/+34
Some of the counters are not implemented in fw. Fw return NOT AVAILABLE VALUE as (0xffffffffffffffff). Adding these counters, result in invalid value. Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-05qlcnic: fix internal loopback testAmit Kumar Salecha2-8/+37
o Loop 10 times with delay of 1 ms to rcv packet. o Print garbage packet. o Try send/receive MAX(16) packet, instead of exit from test, if a packet is not received. Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-04Merge branch 'master' of ↵David S. Miller21-59/+124
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/ipv4/Kconfig net/ipv4/tcp_timer.c
2010-10-04net: introduce DST_NOCACHE flagEric Dumazet3-5/+9
While doing stress tests with IP route cache disabled, and multi queue devices, I noticed a very high contention on one rwlock used in neighbour code. When many cpus are trying to send frames (possibly using a high performance multiqueue device) to the same neighbour, they fight for the neigh->lock rwlock in order to call neigh_hh_init(), and fight on hh->hh_refcnt (a pair of atomic_inc/atomic_dec_and_test()) But we dont need to call neigh_hh_init() for dst that are used only once. It costs four atomic operations at least, on two contended cache lines, plus the high contention on neigh->lock rwlock. Introduce a new dst flag, DST_NOCACHE, that is set when dst was not inserted in route cache. With the stress test bench, sending 160000000 frames on one neighbour, results are : Before patch: real 2m28.406s user 0m11.781s sys 36m17.964s After patch: real 1m26.532s user 0m12.185s sys 20m3.903s Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-04sctp: Fix break indentation in sctp_ioctl().David S. Miller1-1/+1
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-04be2net: add multiple RX queue supportSathya Perla5-363/+526
This patch adds multiple RX queue support to be2net. There are upto 4 extra rx-queues per port into which TCP/UDP traffic can be hashed into. Some of the ethtool stats are now displayed on a per queue basis. Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-04Merge branch 'for-davem' of ↵David S. Miller49-486/+671
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6
2010-10-04qeth: tagging with VLAN-ID 0Ursula Braun2-14/+15
This patch adapts qeth to handle tagged frames with VLAN-ID 0 and with or without priority information in the tag. It enables qeth to receive priority-tagged frames on a base interface, for example from z/OS, without configuring an additional VLAN interface. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-04cxgb4: remove a bogus PCI function number checkDimitris Michailidis1-1/+1
Remove a bogus PCI function number check from the driver's .remove method that causes pci_release_regions not to be called for function 0 if additional functions are attached and one of them is used as primary. Signed-off-by: Dimitris Michailidis <dm@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-04drivers/atm/idt77252.c: Remove unnecessary error checkJulia Lawall1-4/+2
This code does not call deinit_card(card); in an error case, as done in other error-handling code in the same function. But actually, the called function init_sram can only return 0, so there is no need for the error check at all. init_sram is also given a void return type, and its single return statement at the end of the function is dropped. A simplified version of the sematic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @r exists@ @r@ statement S1,S2,S3; constant C1,C2,C3; @@ *if (...) {... S1 return -C1;} ... *if (...) {... when != S1 return -C2;} ... *if (...) {... S1 return -C3;} // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-04drivers-net-tulip-de4x5c-fix-copy-length-in-de4x5_ioctl-checkpatch-fixesAndrew Morton1-1/+2
ERROR: trailing statements should be on next line #23: FILE: drivers/net/tulip/de4x5.c:5477: + if (copy_to_user(ioc->data, tmp.lval, ioc->len)) return -EFAULT; total: 1 errors, 0 warnings, 8 lines checked ./patches/drivers-net-tulip-de4x5c-fix-copy-length-in-de4x5_ioctl.patch has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Dan Rosenberg <dan.j.rosenberg@gmail.com> Cc: Grant Grundler <grundler@parisc-linux.org> Cc: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-04sctp: Fix out-of-bounds reading in sctp_asoc_get_hmac()Dan Rosenberg1-2/+6
The sctp_asoc_get_hmac() function iterates through a peer's hmac_ids array and attempts to ensure that only a supported hmac entry is returned. The current code fails to do this properly - if the last id in the array is out of range (greater than SCTP_AUTH_HMAC_ID_MAX), the id integer remains set after exiting the loop, and the address of an out-of-bounds entry will be returned and subsequently used in the parent function, causing potentially ugly memory corruption. This patch resets the id integer to 0 on encountering an invalid id so that NULL will be returned after finishing the loop if no valid ids are found. Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-04sctp: prevent reading out-of-bounds memoryDan Rosenberg1-1/+12
Two user-controlled allocations in SCTP are subsequently dereferenced as sockaddr structs, without checking if the dereferenced struct members fall beyond the end of the allocated chunk. There doesn't appear to be any information leakage here based on how these members are used and additional checking, but it's still worth fixing. [akpm@linux-foundation.org: remove unfashionable newlines, fix gmail tab->space conversion] Signed-off-by: Dan Rosenberg <dan.j.rosenberg@gmail.com> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-04ipv4: correct IGMP behavior on v3 query during v2-compatibility modeDavid Stevens1-1/+13
A recent patch to allow IGMPv2 responses to IGMPv3 queries bypasses length checks for valid query lengths, incorrectly resets the v2_seen timer, and does not support IGMPv1. The following patch responds with a v2 report as required by IGMPv2 while correcting the other problems introduced by the patch. Signed-Off-By: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-04ipmr: cleanupsEric Dumazet1-114/+124
Various code style cleanups Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-04ipmr: RCU protection for mfc_cache_arrayEric Dumazet2-40/+48
Use RCU & RTNL protection for mfc_cache_array[] ipmr_cache_find() is called under rcu_read_lock(); Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-04ipmr: RCU conversion of mroute_skEric Dumazet1-42/+49
Use RCU and RTNL to protect (struct mr_table)->mroute_sk Readers use RCU, writers use RTNL. ip_ra_control() already use an RCU grace period before ip_ra_destroy_rcu(), so we dont need synchronize_rcu() in mrtsock_destruct() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-04ipmr: __pim_rcv() is called under rcu_read_lockEric Dumazet1-6/+4
No need to get a reference on reg_dev and release it, we are in a rcu_read_lock() protected section. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-04gre: protocol table can be staticstephen hemminger1-1/+1
This table is only used in gre.c Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-04netdev: Depend on INET before selecting INET_LROBen Hutchings1-2/+2
Since 'select' ignores dependencies, drivers that select INET_LRO must depend on INET. This fixes the broken configuration reported in <http://article.gmane.org/gmane.linux.kernel/825646>. Reported-by: Subrata Modak <subrata@linux.vnet.ibm.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-04Revert "ipv4: Make INET_LRO a bool instead of tristate."Ben Hutchings1-1/+1
This reverts commit e81963b180ac502fda0326edf059b1e29cdef1a2. LRO is now deprecated in favour of GRO, and only a few drivers use it, so it is desirable to build it as a module in distribution kernels. The original change to prevent building it as a module was made in an attempt to avoid the case where some dependents are set to y and some to m, and INET_LRO can be set to m rather than y. However, the Kconfig system will reliably set INET_LRO=y in this case. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-04net: Fix the condition passed to sk_wait_event()Nagendra Tomar1-4/+4
This patch fixes the condition (3rd arg) passed to sk_wait_event() in sk_stream_wait_memory(). The incorrect check in sk_stream_wait_memory() causes the following soft lockup in tcp_sendmsg() when the global tcp memory pool has exhausted. >>> snip <<< localhost kernel: BUG: soft lockup - CPU#3 stuck for 11s! [sshd:6429] localhost kernel: CPU 3: localhost kernel: RIP: 0010:[sk_stream_wait_memory+0xcd/0x200] [sk_stream_wait_memory+0xcd/0x200] sk_stream_wait_memory+0xcd/0x200 localhost kernel: localhost kernel: Call Trace: localhost kernel: [sk_stream_wait_memory+0x1b1/0x200] sk_stream_wait_memory+0x1b1/0x200 localhost kernel: [<ffffffff802557c0>] autoremove_wake_function+0x0/0x40 localhost kernel: [ipv6:tcp_sendmsg+0x6e6/0xe90] tcp_sendmsg+0x6e6/0xce0 localhost kernel: [sock_aio_write+0x126/0x140] sock_aio_write+0x126/0x140 localhost kernel: [xfs:do_sync_write+0xf1/0x130] do_sync_write+0xf1/0x130 localhost kernel: [<ffffffff802557c0>] autoremove_wake_function+0x0/0x40 localhost kernel: [hrtimer_start+0xe3/0x170] hrtimer_start+0xe3/0x170 localhost kernel: [vfs_write+0x185/0x190] vfs_write+0x185/0x190 localhost kernel: [sys_write+0x50/0x90] sys_write+0x50/0x90 localhost kernel: [system_call+0x7e/0x83] system_call+0x7e/0x83 >>> snip <<< What is happening is, that the sk_wait_event() condition passed from sk_stream_wait_memory() evaluates to true for the case of tcp global memory exhaustion. This is because both sk_stream_memory_free() and vm_wait are true which causes sk_wait_event() to *not* call schedule_timeout(). Hence sk_stream_wait_memory() returns immediately to the caller w/o sleeping. This causes the caller to again try allocation, which again fails and again calls sk_stream_wait_memory(), and so on. [ Bug introduced by commit c1cbe4b7ad0bc4b1d98ea708a3fecb7362aa4088 ("[NET]: Avoid atomic xchg() for non-error case") -DaveM ] Signed-off-by: Nagendra Singh Tomar <tomer_iisc@yahoo.com> Signed-off-by: David S. Miller <davem@davemloft.net>