summaryrefslogtreecommitdiff
path: root/include/net
AgeCommit message (Collapse)AuthorFilesLines
2009-10-27mac80211: add ieee80211_rx_ni()Kalle Valo1-6/+26
ieee80211_rx() must be called with bottom halves disabled. To simplify driver development implement ieee80211_rx_ni() which disables BH. This function must be used when in process context. Signed-off-by: Kalle Valo <kalle.valo@nokia.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-10-27cfg80211: no cookies in cfg80211_send_XXX()Holger Schurig1-8/+23
Get rid of cookies in cfg80211_send_XXX() functions. Signed-off-by: Holger Schurig <hs4233@mail.mn-solutions.de> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-10-23pkt_sched: skbedit add support for setting markjamal1-0/+2
This adds support for setting the skb mark. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-21net: Fix for dst_negative_adviceKrishna Kumar1-2/+10
dst_negative_advice() should check for changed dst and reset sk_tx_queue_mapping accordingly. Pass sock to the callers of dst_negative_advice. (sk_reset_txq is defined just for use by dst_negative_advice. The only way I could find to get around this is to move dst_negative_() from dst.h to dst.c, include sock.h in dst.c, etc) Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-21net: Introduce sk_tx_queue_mappingKrishna Kumar1-0/+26
Introduce sk_tx_queue_mapping; and functions that set, test and get this value. Reset sk_tx_queue_mapping to -1 whenever the dst cache is set/reset, and in socket alloc. Setting txq to -1 and using valid txq=<0 to n-1> allows the tx path to use the value of sk_tx_queue_mapping directly instead of subtracting 1 on every tx. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-20net: Avoid compiler warning for mmsghdr when CONFIG_COMPAT is not selectedArnaldo Carvalho de Melo1-1/+5
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-19xfrm: remove skb_icv_walkSteffen Klassert1-3/+0
The last users of skb_icv_walk are converted to ahash now, so skb_icv_walk is unused and can be removed. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-19ah: Remove obsolete codeSteffen Klassert1-26/+3
ah4 and ah6 are converted to ahash now, so we can remove the code for the obsolete hash algorithm. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-19ah: Add struct crypto_ahash to ah_dataSteffen Klassert1-0/+1
To support for ahash algorithms, we add a pointer to a crypto_ahash to ah_data. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-19inet: rename some inet_sock fieldsEric Dumazet5-33/+33
In order to have better cache layouts of struct sock (separate zones for rx/tx paths), we need this preliminary patch. Goal is to transfert fields used at lookup time in the first read-mostly cache line (inside struct sock_common) and move sk_refcnt to a separate cache line (only written by rx path) This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr, sport and id fields. This allows a future patch to define these fields as macros, like sk_refcnt, without name clashes. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-15Phonet: routing table Netlink interfaceRémi Denis-Courmont1-0/+1
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-15Phonet: routing table backendRémi Denis-Courmont1-0/+5
The Phonet "universe" only has 64 addresses, so we keep a trivial flat routing table. Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-15Phonet: deliver broadcast packets to broadcast socketsRémi Denis-Courmont1-0/+1
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-13Merge branch 'master' of ↵David S. Miller2-5/+7
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
2009-10-13Merge branch 'master' of ↵David S. Miller1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6
2009-10-13tcp: replace ehash_size by ehash_maskEric Dumazet1-2/+2
Storing the mask (size - 1) instead of the size allows fast path to be a bit faster. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-13net: Introduce recvmmsg socket syscallArnaldo Carvalho de Melo1-0/+8
Meaning receive multiple messages, reducing the number of syscalls and net stack entry/exit operations. Next patches will introduce mechanisms where protocols that want to optimize this operation will provide an unlocked_recvmsg operation. This takes into account comments made by: . Paul Moore: sock_recvmsg is called only for the first datagram, sock_recvmsg_nosec is used for the rest. . Caitlin Bestler: recvmmsg now has a struct timespec timeout, that works in the same fashion as the ppoll one. If the underlying protocol returns a datagram with MSG_OOB set, this will make recvmmsg return right away with as many datagrams (+ the OOB one) it has received so far. . Rémi Denis-Courmont & Steven Whitehouse: If we receive N < vlen datagrams and then recvmsg returns an error, recvmmsg will return the successfully received datagrams, store the error and return it in the next call. This paves the way for a subsequent optimization, sk_prot->unlocked_recvmsg, where we will be able to acquire the lock only at batch start and end, not at every underlying recvmsg call. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-13net: Generalize socket rx gap / receive queue overflow cmsgNeil Horman1-0/+3
Create a new socket level option to report number of queue overflows Recently I augmented the AF_PACKET protocol to report the number of frames lost on the socket receive queue between any two enqueued frames. This value was exported via a SOL_PACKET level cmsg. AFter I completed that work it was requested that this feature be generalized so that any datagram oriented socket could make use of this option. As such I've created this patch, It creates a new SOL_SOCKET level option called SO_RXQ_OVFL, which when enabled exports a SOL_SOCKET level cmsg that reports the nubmer of times the sk_receive_queue overflowed between any two given frames. It also augments the AF_PACKET protocol to take advantage of this new feature (as it previously did not touch sk->sk_drops, which this patch uses to record the overflow count). Tested successfully by me. Notes: 1) Unlike my previous patch, this patch simply records the sk_drops value, which is not a number of drops between packets, but rather a total number of drops. Deltas must be computed in user space. 2) While this patch currently works with datagram oriented protocols, it will also be accepted by non-datagram oriented protocols. I'm not sure if thats agreeable to everyone, but my argument in favor of doing so is that, for those protocols which aren't applicable to this option, sk_drops will always be zero, and reporting no drops on a receive queue that isn't used for those non-participating protocols seems reasonable to me. This also saves us having to code in a per-protocol opt in mechanism. 3) This applies cleanly to net-next assuming that commit 977750076d98c7ff6cbda51858bb5a5894a9d9ab (my af packet cmsg patch) is reverted Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-12mac80211: document ieee80211_rx() context requirementJohannes Berg1-0/+2
ieee80211_rx() must be called with softirqs disabled since the networking stack requires this for netif_rx() and some code in mac80211 can assume that it can not be processing its own tasklet and this call at the same time. It may be possible to remove this requirement after a careful audit of mac80211 and doing any needed locking improvements in it along with disabling softirqs around netif_rx(). An alternative might be to push all packet processing to process context in mac80211, instead of to the tasklet, and add other synchronisation. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-10-12net: Fix struct sock bitfield annotationEric Dumazet1-5/+5
Since commit a98b65a3 (net: annotate struct sock bitfield), we lost 8 bytes in struct sock on 64bit arches because of kmemcheck_bitfield_end(flags) misplacement. Fix this by putting together sk_shutdown, sk_no_check, sk_userlocks, sk_protocol and sk_type in the 'flags' 32bits bitfield Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-10Merge branch 'master' of ↵David S. Miller4-22/+52
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6
2009-10-08udp: dynamically size hash tables at boot timeEric Dumazet1-3/+10
UDP_HTABLE_SIZE was initialy defined to 128, which is a bit small for several setups. 4000 active UDP sockets -> 32 sockets per chain in average. An incoming frame has to lookup all sockets to find best match, so long chains hurt latency. Instead of a fixed size hash table that cant be perfect for every needs, let UDP stack choose its table size at boot time like tcp/ip route, using alloc_large_system_hash() helper Add an optional boot parameter, uhash_entries=x so that an admin can force a size between 256 and 65536 if needed, like thash_entries and rhash_entries. dmesg logs two new lines : [ 0.647039] UDP hash table entries: 512 (order: 0, 4096 bytes) [ 0.647099] UDP Lite hash table entries: 512 (order: 0, 4096 bytes) Maximal size on 64bit arches would be 65536 slots, ie 1 MBytes for non debugging spinlocks. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-08cfg80211: add firmware and hardware version to wiphyKalle Valo1-0/+3
It's useful to provide firmware and hardware version to user space and have a generic interface to retrieve them. Users can provide the version information in bug reports etc. Add fields for firmware and hardware version to struct wiphy. (Dropped nl80211 bits for now and modified remaining bits in favor of ethtool. -- JWL) Cc: Kalle Valo <kalle.valo@nokia.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-10-08wext: refactorJohannes Berg4-22/+49
Refactor wext to * split out iwpriv handling * split out iwspy handling * split out procfs support * allow cfg80211 to have wireless extensions compat code w/o CONFIG_WIRELESS_EXT After this, drivers need to - select WIRELESS_EXT - for wext support - select WEXT_PRIV - for iwpriv support - select WEXT_SPY - for iwspy support except cfg80211 -- which gets new hooks in wext-core.c and can then get wext handlers without CONFIG_WIRELESS_EXT. Wireless extensions procfs support is auto-selected based on PROC_FS and anything that requires the wext core (i.e. WIRELESS_EXT or CFG80211_WEXT). Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-10-07net: mark net_proto_ops as constStephen Hemminger1-1/+1
All usages of structure net_proto_ops should be declared const. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-07pkt_sched: gen_estimator: Dont report fake rate estimatorsEric Dumazet1-0/+1
Jarek Poplawski a écrit : > > > Hmm... So you made me to do some "real" work here, and guess what?: > there is one serious checkpatch warning! ;-) Plus, this new parameter > should be added to the function description. Otherwise: > Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> > > Thanks, > Jarek P. > > PS: I guess full "Don't" would show we really mean it... Okay :) Here is the last round, before the night ! Thanks again [RFC] pkt_sched: gen_estimator: Don't report fake rate estimators We currently send TCA_STATS_RATE_EST elements to netlink users, even if no estimator is running. # tc -s -d qdisc qdisc pfifo_fast 0: dev eth0 root bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 Sent 112833764978 bytes 1495081739 pkt (dropped 0, overlimits 0 requeues 0) rate 0bit 0pps backlog 0b 0p requeues 0 User has no way to tell if the "rate 0bit 0pps" is a real estimation, or a fake one (because no estimator is active) After this patch, tc command output is : $ tc -s -d qdisc qdisc pfifo_fast 0: dev eth0 root bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 Sent 561075 bytes 1196 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 We add a parameter to gnet_stats_copy_rate_est() function so that it can use gen_estimator_active(bstats, r), as suggested by Jarek. This parameter can be NULL if check is not necessary, (htb for example has a mandatory rate estimator) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-07ipv6 sit: 6rd (IPv6 Rapid Deployment) Support.YOSHIFUJI Hideaki / 吉藤英明1-0/+13
IPv6 Rapid Deployment (6rd; draft-ietf-softwire-ipv6-6rd) builds upon mechanisms of 6to4 (RFC3056) to enable a service provider to rapidly deploy IPv6 unicast service to IPv4 sites to which it provides customer premise equipment. Like 6to4, it utilizes stateless IPv6 in IPv4 encapsulation in order to transit IPv4-only network infrastructure. Unlike 6to4, a 6rd service provider uses an IPv6 prefix of its own in place of the fixed 6to4 prefix. With this option enabled, the SIT driver offers 6rd functionality by providing additional ioctl API to configure the IPv6 Prefix for in stead of static 2002::/16 for 6to4. Original patch was done by Alexandre Cassen <acassen@freebox.fr> based on old Internet-Draft. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-07net: speedup sk_wake_async()Eric Dumazet1-1/+2
An incoming datagram must bring into cpu cache *lot* of cache lines, in particular : (other parts omitted (hash chains, ip route cache...)) On 32bit arches : offsetof(struct sock, sk_rcvbuf) =0x30 (read) offsetof(struct sock, sk_lock) =0x34 (rw) offsetof(struct sock, sk_sleep) =0x50 (read) offsetof(struct sock, sk_rmem_alloc) =0x64 (rw) offsetof(struct sock, sk_receive_queue)=0x74 (rw) offsetof(struct sock, sk_forward_alloc)=0x98 (rw) offsetof(struct sock, sk_callback_lock)=0xcc (rw) offsetof(struct sock, sk_drops) =0xd8 (read if we add dropcount support, rw if frame dropped) offsetof(struct sock, sk_filter) =0xf8 (read) offsetof(struct sock, sk_socket) =0x138 (read) offsetof(struct sock, sk_data_ready) =0x15c (read) We can avoid sk->sk_socket and socket->fasync_list referencing on sockets with no fasync() structures. (socket->fasync_list ptr is probably already in cache because it shares a cache line with socket->wait, ie location pointed by sk->sk_sleep) This avoids one cache line load per incoming packet for common cases (no fasync()) We can leave (or even move in a future patch) sk->sk_socket in a cold location Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-05tunnels: Optimize tx pathEric Dumazet1-3/+3
We currently dirty a cache line to update tunnel device stats (tx_packets/tx_bytes). We better use the txq->tx_bytes/tx_packets counters that already are present in cpu cache, in the cache line shared with txq->_xmit_lock This patch extends IPTUNNEL_XMIT() macro to use txq pointer provided by the caller. Also &tunnel->dev->stats can be replaced by &dev->stats Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-05ipv4: fib table algorithm performance improvementStephen Hemminger1-11/+14
The FIB algorithim for IPV4 is set at compile time, but kernel goes through the overhead of function call indirection at runtime. Save some cycles by turning the indirect calls to direct calls to either hash or trie code. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-01net: Make setsockopt() optlen be unsigned.David S. Miller8-20/+20
This provides safety against negative optlen at the type level instead of depending upon (sometimes non-trivial) checks against this sprinkled all over the the place, in each and every implementation. Based upon work done by Arjan van de Ven and feedback from Linus Torvalds. Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-29wext: add back wireless/ dir in sysfs for cfg80211 interfacesJohannes Berg1-0/+1
The move away from having drivers assign wireless handlers, in favour of making cfg80211 assign them, broke the sysfs registration (the wireless/ dir went missing) because the handlers are now assigned only after registration, which is too late. Fix this by special-casing cfg80211-based devices, all of which are required to have an ieee80211_ptr, in the sysfs code, and also using get_wireless_stats() to have the same values reported as in procfs. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reported-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Tested-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-09-27Revert "sit: stateless autoconf for isatap"Sascha Hlusiak1-7/+0
This reverts commit 645069299a1c7358cf7330afe293f07552f11a5d. While the code does not actually break anything, it does not completely follow RFC5214 yet. After talking back with Fred L. Templin, I agree that completing the ISATAP specific RS/RA code, would pollute the kernel a lot with code that is better implemented in userspace. The kernel should not send RS packages for ISATAP at all. Signed-off-by: Sascha Hlusiak <contact@saschahlusiak.de> Acked-by: Fred L. Templin <Fred.L.Templin@boeing.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-25tunnel: eliminate recursion fieldEric Dumazet1-1/+0
It seems recursion field from "struct ip_tunnel" is not anymore needed. recursion prevention is done at the upper level (in dev_queue_xmit()), since we use HARD_TX_LOCK protection for tunnels. This avoids a cache line ping pong on "struct ip_tunnel" : This structure should be now mostly read on xmit and receive paths. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-24sysctl: remove "struct file *" argument of ->proc_handlerAlexey Dobriyan2-3/+1
It's unused. It isn't needed -- read or write flag is already passed and sysctl shouldn't care about the rest. It _was_ used in two places at arch/frv for some reason. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "David S. Miller" <davem@davemloft.net> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-239p: Add fscache support to 9pAbhishek Kulkarni1-0/+3
This patch adds a persistent, read-only caching facility for 9p clients using the FS-Cache caching backend. When the fscache facility is enabled, each inode is associated with a corresponding vcookie which is an index into the FS-Cache indexing tree. The FS-Cache indexing tree is indexed at 3 levels: - session object associated with each mount. - inode/vcookie - actual data (pages) A cache tag is chosen randomly for each session. These tags can be read off /sys/fs/9p/caches and can be passed as a mount-time parameter to re-attach to the specified caching session. Signed-off-by: Abhishek Kulkarni <adkulkar@umail.iu.edu> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2009-09-15pkt_sched: Fix tx queue selection in tc_modify_qdiscJarek Poplawski1-1/+1
After the recent mq change there is the new select_queue qdisc class method used in tc_modify_qdisc, but it works OK only for direct child qdiscs of mq qdisc. Grandchildren always get the first tx queue, which would give wrong qdisc_root etc. results (e.g. for sch_htb as child of sch_prio). This patch fixes it by using parent's dev_queue for such grandchildren qdiscs. The select_queue method's return type is changed BTW. With feedback from: Patrick McHardy <kaber@trash.net> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-15bonding: remap muticast addresses without using dev_close() and dev_open()Moni Shoua1-0/+2
This patch fixes commit e36b9d16c6a6d0f59803b3ef04ff3c22c3844c10. The approach there is to call dev_close()/dev_open() whenever the device type is changed in order to remap the device IP multicast addresses to HW multicast addresses. This approach suffers from 2 drawbacks: *. It assumes tha the device is UP when calling dev_close(), or otherwise dev_close() has no affect. It is worth to mention that initscripts (Redhat) and sysconfig (Suse) doesn't act the same in this matter. *. dev_close() has other side affects, like deleting entries from the routing table, which might be unnecessary. The fix here is to directly remap the IP multicast addresses to HW multicast addresses for a bonding device that changes its type, and nothing else. Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Moni Shoua <monis@voltaire.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-15tcp: fix ssthresh u16 leftoverIlpo Järvinen1-0/+7
It was once upon time so that snd_sthresh was a 16-bit quantity. ...That has not been true for long period of time. I run across some ancient compares which still seem to trust such legacy. Put all that magic into a single place, I hopefully found all of them. Compile tested, though linking of allyesconfig is ridiculous nowadays it seems. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-15net: constify struct inet6_protocolAlexey Dobriyan1-3/+3
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-15net: constify struct net_protocolAlexey Dobriyan1-4/+3
Remove long removed "inet_protocol_base" declaration. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-11Merge branch 'master' of ↵David S. Miller3-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6
2009-09-10net_sched: fix estimator lock selection for mq child qdiscsPatrick McHardy1-0/+1
When new child qdiscs are attached to the mq qdisc, they are actually attached as root qdiscs to the device queues. The lock selection for new estimators incorrectly picks the root lock of the existing and to be replaced qdisc, which results in a use-after-free once the old qdisc has been destroyed. Mark mq qdisc instances with a new flag and treat qdiscs attached to mq as children similar to regular root qdiscs. Additionally prevent estimators from being attached to the mq qdisc itself since it only updates its byte and packet counters during dumps. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-06net_sched: add classful multiqueue dummy schedulerDavid S. Miller1-0/+4
This patch adds a classful dummy scheduler which can be used as root qdisc for multiqueue devices and exposes each device queue as a child class. This allows to address queues individually and graft them similar to regular classes. Additionally it presents an accumulated view of the statistics of all real root qdiscs in the dummy root. Two new callbacks are added to the qdisc_ops and qdisc_class_ops: - cl_ops->select_queue selects the tx queue number for new child classes. - qdisc_ops->attach() overrides root qdisc device grafting to attach non-shared qdiscs to the queues. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-06net_sched: move dev_graft_qdisc() to sch_generic.cPatrick McHardy1-0/+2
It will be used in a following patch by the multiqueue qdisc. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-05sctp: turn flags in 'struct sctp_association' into bit fieldsWei Yongjun1-12/+9
This shrinks the size of struct sctp_association a little. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-05sctp: Sysctl configuration for IPv4 Address ScopingBhaskar Dutta2-0/+17
This patch introduces a new sysctl option to make IPv4 Address Scoping configurable <draft-stewart-tsvwg-sctp-ipv4-00.txt>. In networking environments where DNAT rules in iptables prerouting chains convert destination IP's to link-local/private IP addresses, SCTP connections fail to establish as the INIT chunk is dropped by the kernel due to address scope match failure. For example to support overlapping IP addresses (same IP address with different vlan id) a Layer-5 application listens on link local IP's, and there is a DNAT rule that maps the destination IP to a link local IP. Such applications never get the SCTP INIT if the address-scoping draft is strictly followed. This sysctl configuration allows SCTP to function in such unconventional networking environments. Sysctl options: 0 - Disable IPv4 address scoping draft altogether 1 - Enable IPv4 address scoping (default, current behavior) 2 - Enable address scoping but allow IPv4 private addresses in init/init-ack 3 - Enable address scoping but allow IPv4 link local address in init/init-ack Signed-off-by: Bhaskar Dutta <bhaskar.dutta@globallogic.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-05sctp: Turn flags in 'sctp_packet' into bit fieldsVlad Yasevich1-16/+6
This shrinks the size of sctp_packet a little. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-05sctp: Fix SCTP_MAXSEG socket option to comply to spec.Vlad Yasevich2-3/+5
We had a bug that we never stored the user-defined value for MAXSEG when setting the value on an association. Thus future PMTU events ended up re-writing the frag point and increasing it past user limit. Additionally, when setting the option on the socket/endpoint, we effect all current associations, which is against spec. Now, we store the user 'maxseg' value along with the computed 'frag_point'. We inherit 'maxseg' from the socket at association creation and use it as an upper limit for 'frag_point' when its set. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-05sctp: Don't do NAGLE delay on large writes that were fragmented smallVlad Yasevich1-1/+1
SCTP will delay the last part of a large write due to NAGLE, if that part is smaller then MTU. Since we are doing large writes, we might as well send the last portion now instead of waiting untill the next large write happens. The small portion will be sent as is regardless, so it's better to not delay it. This is a result of much discussions with Wei Yongjun <yjwei@cn.fujitsu.com> and Doug Graham <dgraham@nortel.com>. Many thanks go out to them. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>