summaryrefslogtreecommitdiff
path: root/net/sched
AgeCommit message (Collapse)AuthorFilesLines
2007-11-13[PKT_SCHED] CLS_U32: Fix endianness problem with u32 classifier hash masks.Radu Rendec1-2/+2
While trying to implement u32 hashes in my shaping machine I ran into a possible bug in the u32 hash/bucket computing algorithm (net/sched/cls_u32.c). The problem occurs only with hash masks that extend over the octet boundary, on little endian machines (where htonl() actually does something). Let's say that I would like to use 0x3fc0 as the hash mask. This means 8 contiguous "1" bits starting at b6. With such a mask, the expected (and logical) behavior is to hash any address in, for instance, 192.168.0.0/26 in bucket 0, then any address in 192.168.0.64/26 in bucket 1, then 192.168.0.128/26 in bucket 2 and so on. This is exactly what would happen on a big endian machine, but on little endian machines, what would actually happen with current implementation is 0x3fc0 being reversed (into 0xc03f0000) by htonl() in the userspace tool and then applied to 192.168.x.x in the u32 classifier. When shifting right by 16 bits (rank of first "1" bit in the reversed mask) and applying the divisor mask (0xff for divisor 256), what would actually remain is 0x3f applied on the "168" octet of the address. One could say is this can be easily worked around by taking endianness into account in userspace and supplying an appropriate mask (0xfc03) that would be turned into contiguous "1" bits when reversed (0x03fc0000). But the actual problem is the network address (inside the packet) not being converted to host order, but used as a host-order value when computing the bucket. Let's say the network address is written as n31 n30 ... n0, with n0 being the least significant bit. When used directly (without any conversion) on a little endian machine, it becomes n7 ... n0 n8 ..n15 etc in the machine's registers. Thus bits n7 and n8 would no longer be adjacent and 192.168.64.0/26 and 192.168.128.0/26 would no longer be consecutive. The fix is to apply ntohl() on the hmask before computing fshift, and in u32_hash_fold() convert the packet data to host order before shifting down by fshift. With helpful feedback from Jamal Hadi Salim and Jarek Poplawski. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@kernel.org>
2007-11-13[PKT_SCHED]: Fix OOPS when removing devices from a teql queuing disciplineEvgeniy Polyakov1-0/+3
[ Upstream commit: 4f9f8311a08c0d95c70261264a2b47f2ae99683a ] tecl_reset() is called from deactivate and qdisc is set to noop already, but subsequent teql_xmit does not know about it and dereference private data as teql qdisc and thus oopses. not catch it first :) Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@kernel.org>
2007-10-18[PKT_SCHED] cls_u32: error code isn't been propogated properlyStephen Hemminger1-1/+1
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@kernel.org>
2007-05-23[NET_SCHED]: prio qdisc boundary conditionJamal Hadi Salim1-1/+1
This fixes an out-of-boundary condition when the classified band equals q->bands. Caught by Alexey Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-04-14[NET_SCHED]: cls_tcindex: fix compatibility breakagePatrick McHardy1-2/+2
Userspace uses an integer for TCA_TCINDEX_SHIFT, the kernel was changed to expect and use a u16 value in 2.6.11, which broke compatibility on big endian machines. Change back to use int. Reported by Ole Reinartz <ole.reinartz@gmx.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-04-14[NET_SCHED]: cls_basic: fix memory leak in basic_destroyPatrick McHardy1-0/+1
tp->root is not freed on destruction. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-04-03[IFB]: Fix crash on input device removalPatrick McHardy1-1/+1
The input_device pointer is not refcounted, which means the device may disappear while packets are queued, causing a crash when ifb passes packets with a stale skb->dev pointer to netif_rx(). Fix by storing the interface index instead and do a lookup where neccessary. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-03-28[NET_SCHED]: cls_basic: fix NULL pointer dereferencePatrick McHardy1-10/+7
cls_basic doesn't allocate tp->root before it is linked into the active classifier list, resulting in a NULL pointer dereference when packets hit the classifier before its ->change function is called. Reported by Chris Madden <chris@reflexsecurity.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-03-28[NET_SCHED]: Fix endless loops caused by inaccurate qlen countersPatrick McHardy11-48/+126
There are multiple problems related to qlen adjustment that can lead to an upper qdisc getting out of sync with the real number of packets queued, leading to endless dequeueing attempts by the upper layer code. All qdiscs must maintain an accurate q.qlen counter. There are basically two groups of operations affecting the qlen: operations that propagate down the tree (enqueue, dequeue, requeue, drop, reset) beginning at the root qdisc and operations only affecting a subtree or single qdisc (change, graft, delete class). Since qlen changes during operations from the second group don't propagate to ancestor qdiscs, their qlen values become desynchronized. This patch adds a function to propagate qlen changes up the qdisc tree, optionally calling a callback function to perform qdisc-internal maintenance when the child qdisc is deactivated, and converts all qdiscs to use this where necessary. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-01-04NET_SCHED: Fix fallout from dev->qdisc RCU changePatrick McHardy3-55/+31
The move of qdisc destruction to a rcu callback broke locking in the entire qdisc layer by invalidating previously valid assumptions about the context in which changes to the qdisc tree occur. The two assumptions were: - since changes only happen in process context, read_lock doesn't need bottem half protection. Now invalid since destruction of inner qdiscs, classifiers, actions and estimators happens in the RCU callback unless they're manually deleted, resulting in dead-locks when read_lock in process context is interrupted by write_lock_bh in bottem half context. - since changes only happen under the RTNL, no additional locking is necessary for data not used during packet processing (f.e. u32_list). Again, since destruction now happens in the RCU callback, this assumption is not valid anymore, causing races while using this data, which can result in corruption or use-after-free. Instead of "fixing" this by disabling bottem halfs everywhere and adding new locks/refcounting, this patch makes these assumptions valid again by moving destruction back to process context. Since only the dev->qdisc pointer is protected by RCU, but ->enqueue and the qdisc tree are still protected by dev->qdisc_lock, destruction of the tree can be performed immediately and only the final free needs to happen in the rcu callback to make sure dev_queue_xmit doesn't access already freed memory. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-12-04[NET_SCHED]: policer: restore compatibility with old iproute binariesPatrick McHardy1-4/+22
The tc actions increased the size of struct tc_police, which broke compatibility with old iproute binaries since both the act_police and the old NET_CLS_POLICE code check for an exact size match. Since the new members are not even used, the simple fix is to also accept the size of the old structure. Dumping is not affected since old userspace will receive a bigger structure, which is handled fine. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-12-04[PKT_SCHED] act_gact: division by zeroKim Nordlund1-2/+2
Not returning -EINVAL, because someone might want to use the value zero in some future gact_prob algorithm? Signed-off-by: Kim Nordlund <kim.nordlund@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-11-07[MAINTAINERS]: Add proper entry for TC classifierStephen Hemminger1-2/+0
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-11-07[PKT_SCHED]: act_api: Fix module leak while flushing actionsThomas Graf1-1/+1
Module reference needs to be given back if message header construction fails. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-11-07PKT_SCHED: Return ENOENT if action module is unavailableThomas Graf1-0/+1
Return ENOENT if action module is unavailable Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-11-07PKT_SCHED: Fix illegal memory dereferences when dumping actionsThomas Graf1-6/+5
The TCA_ACT_KIND attribute is used without checking its availability when dumping actions therefore leading to a value of 0x4 being dereferenced. The use of strcmp() in tc_lookup_action_n() isn't safe when fed with string from an attribute without enforcing proper NUL termination. Both bugs can be triggered with malformed netlink message and don't require any privileges. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-11-07PKT_SCHED: Fix error handling while dumping actionsThomas Graf1-2/+4
"return -err" and blindly inheriting the error code in the netlink failure exception handler causes errors codes to be returned as positive value therefore making them being ignored by the caller. May lead to sending out incomplete netlink messages. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-10-14PKT_SCHED: cls_basic: Use unsigned int when generating handleKim Nordlund1-1/+1
Prevents filters from being added if the first generated handle already exists. Signed-off-by: Kim Nordlund <kim.nordlund@nokia.com> Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-03-13[NET_SCHED]: act_api: fix skb leak in error pathPatrick McHardy1-1/+1
The skb is allocated by the function, so it needs to be freed instead of trimmed on overrun. Coverity #614 Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-18[PKT_SCHED]: Handle SCTP/DCCP in sfq_hashPatrick McHardy1-0/+4
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-17[PKT_SCHED] sch_prio: fix qdisc bands initAmnon Aaronsohn1-4/+3
Currently when PRIO is configured to use N bands, it lets the packets be directed to any of the bands 0..N-1. However, PRIO attaches a fifo qdisc only to the bands that appear in the priomap; the rest of the N bands remain with a noop qdisc attached. This patch changes PRIO's behavior so that it attaches a fifo qdisc to all of the N bands. Signed-off-by: Amnon Aaronsohn <bla@cs.huji.ac.il> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-14[PKT_SCHED]: Change default clock source to gettimeofdayPatrick McHardy1-1/+1
The default of using jiffies is very bad and results in underutilization except with very low bandwidth. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-13[NETFILTER] x_tables: Abstraction layer for {ip,ip6,arp}_tablesHarald Welte1-1/+1
This monster-patch tries to do the best job for unifying the data structures and backend interfaces for the three evil clones ip_tables, ip6_tables and arp_tables. In an ideal world we would never have allowed this kind of copy+paste programming... but well, our world isn't (yet?) ideal. o introduce a new x_tables module o {ip,arp,ip6}_tables depend on this x_tables module o registration functions for tables, matches and targets are only wrappers around x_tables provided functions o all matches/targets that are used from ip_tables and ip6_tables are now implemented as xt_FOOBAR.c files and provide module aliases to ipt_FOOBAR and ip6t_FOOBAR o header files for xt_matches are in include/linux/netfilter/, include/linux/netfilter_{ipv4,ipv6} contains compatibility wrappers around the xt_FOOBAR.h headers Based on this patchset we're going to further unify the code, gradually getting rid of all the layer 3 specific assumptions. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-12[PKT_SCHED] net/sched/Kconfig: fix typo in NET_EMATCH_META descriptionAdrian Bunk1-1/+1
Noted by Matt LaPlante <webmaster@cyberdogtech.com>. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-12[PKT_SCHED] ematch: Remove bogus include.Evgeniy Polyakov1-1/+0
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-10[PKT_SCHED]: Fix qdisc return code.Jamal Hadi Salim4-9/+10
The mapping between TC_ACTION_SHOT and the qdisc return codes is better suited to NET_XMIT_BYPASS so as not to confuse TCP Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-10[PKT_SCHED]: Prefix tc actions with act_Patrick McHardy8-8/+8
Clean up the net/sched directory a bit by prefix all actions with act_. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-10[PKT_SCHED]: Fix memory leak when dumping in pedit actionPatrick McHardy1-0/+2
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-10[PKT_SCHED]: Remove some obsolete policer exportsPatrick McHardy1-11/+3
Also make sure the legacy code is only built when CONFIG_NET_CLS_ACT is not set. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-10[PKT_SCHED]: Convert tc action functions to single skb pointersPatrick McHardy7-13/+10
tcf_action_exec only gets a single skb pointer and doesn't own the skb, but passes double skb pointers (to a local variable) to the action functions. Change to use single skb pointers everywhere. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-10[PKT_SCHED]: Use USEC_PER_SECPatrick McHardy1-4/+4
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-10[NET]: Convert net/{ipv4,ipv6,sched} to netdev_privPatrick McHardy1-6/+6
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-04[INET_SOCK]: Move struct inet_sock & helper functions to net/inet_sock.hArnaldo Carvalho de Melo1-0/+1
To help in reducing the number of include dependencies, several files were touched as they were getting needed headers indirectly for stuff they use. Thanks also to Alan Menegotto for pointing out that net/dccp/proto.c had linux/dccp.h include twice. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-04[PKT_SCHED] netem: packet corruption optionStephen Hemminger1-3/+46
Here is a new feature for netem in 2.6.16. It adds the ability to randomly corrupt packets with netem. A version was done by Hagen Paul Pfeifer, but I redid it to handle the cases of backwards compatibility with netlink interface and presence of hardware checksum offload. It is useful for testing hardware offload in devices. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-14[PKT_SCHED]: Disable debug tracing logs by default in packet action API.David S. Miller1-1/+1
Noticed by Andi Kleen. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-21[PKT_SCHED]: sch_netem: correctly order packets to be sent simultaneouslyAndrea Bittau1-1/+1
If two packets were queued to be sent at the same time in the future, their order would be reversed. This would occur because the queue is traversed back to front, and a position is found by checking whether the new packet needs to be sent before the packet being examined. If the new packet is to be sent at the same time of a previous packet, it would end up before the old packet in the queue. This patch places packets in the correct order when they are queued to be sent at a same time in the future. Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-18[NET]: Sanitize NET_SCHED protection in /net/sched/KconfigRoman Zippel1-29/+8
On Thu, 17 Nov 2005, David Gómez wrote: > I found out that if i select NET_CLS_ROUTE4, save my changes and exit > menuconfig, execute again make menuconfig and go to QoS options, then the new > available options are visible. So menuconfig has some problem refreshing > contents :? No, they were there before too, but you have to go up one level to see them. It's better in 2.6.15-rc1-git5, but the menu structure is still a little messed up, the patch below properly indents all menu entries. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-08[NET]: kfree cleanupJesper Juhl6-16/+9
From: Jesper Juhl <jesper.juhl@gmail.com> This is the net/ part of the big kfree cleanup patch. Remove pointless checks for NULL prior to calling kfree() in net/. Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Arnaldo Carvalho de Melo <acme@conectiva.com.br> Acked-by: Marcel Holtmann <marcel@holtmann.org> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Andrew Morton <akpm@osdl.org>
2005-11-08[PKT_SCHED]: Correctly handle empty ematch treesThomas Graf1-0/+5
Fixes an invalid memory reference when the basic classifier is used without any ematches but just actions. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-06Merge branch 'red' of 84.73.165.173:/home/tgr/repos/net-2.6Arnaldo Carvalho de Melo2-741/+518
2005-11-06[NETEM]: Add version stringStephen Hemminger1-0/+3
Add a version string to help support issues. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-06[NETEM]: Support time based reorderingStephen Hemminger1-1/+84
Change netem to support packets getting reordered because of variations in delay. Introduce a special case version of FIFO that queues packets in order based on the netem delay. Since netem is classful, those users that don't want jitter based reordering can just insert a pfifo instead of the default. This required changes to generic skbuff code to allow finer grain manipulation of sk_buff_head. Insertion into the middle and reverse walk. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-06[PKT_SCHED]: (G)RED: Introduce hard droppingThomas Graf2-2/+14
Introduces a new flag TC_RED_HARDDROP which specifies that if ECN marking is enabled packets should still be dropped once the average queue length exceeds the maximum threshold. This _may_ help to avoid global synchronisation during small bursts of peers advertising but not caring about ECN. Use this option very carefully, it does more harm than good if (qth_max - qth_min) does not cover at least two average burst cycles. The difference to the current behaviour, in which we'd run into the hard queue limit, is that due to the low pass filter of RED short bursts are less likely to cause a global synchronisation. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-06[PKT_SCHED]: GRED: Support ECN markingThomas Graf1-4/+21
Adds a new u8 flags in a unused padding area of the netlink message. Adds ECN marking support to be used instead of dropping packets immediately. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-06[PKT_SCHED]: GRED: Fix restart of idle period in WRED mode upon dequeue and dropThomas Graf1-2/+2
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-06[PKT_SCHED]: GRED: Cleanup and remove unnecessary codeThomas Graf1-69/+31
Removes unnecessary includes, initializers, and simplifies the code a bit. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-06[PKT_SCHED]: GRED: Remove auto-creation of default VQThomas Graf1-9/+0
Since we are no longer depending on the default VQ to be always allocated we can leave it up to the user to actually create it. This gives the user the ability to leave it out on purpose and enqueue packets directly to the device without applying the RED algorithm. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-06[PKT_SCHED]: GRED: Dont abuse default VQ for equalizingThomas Graf1-17/+20
Introduces a new red parameter set for use in equalize mode, although only the qavg variable and the idle period marker are being used for now this makes it possible to allow a separate parameter set to be used for equalize later on. The use of this separate parameter set fixes a bogus start of an idle period in gred_drop() which did start an idle period on the default VQ even if equalize mode was disabled. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-06[PKT_SCHED]: GRED: Remove initd flagThomas Graf1-14/+1
The case when the default VQ is not set up yet is already handled in a less error prone way. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-06[PKT_SCHED]: GRED: Improve error handling and messagesThomas Graf1-24/+44
Try to enqueue packets if we cannot associate it with a VQ, this basically means that the default VQ has not been set up yet. We must check if the VQ still exists while requeueing, the VQ might have been changed between dequeue and the requeue of the underlying qdisc. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>