summaryrefslogtreecommitdiff
path: root/net/sched/cls_api.c
AgeCommit message (Collapse)AuthorFilesLines
2019-01-31net_sched: refetch skb protocol for each filterCong Wang1-2/+1
[ Upstream commit cd0c4e70fc0ccfa705cdf55efb27519ce9337a26 ] Martin reported a set of filters don't work after changing from reclassify to continue. Looking into the code, it looks like skb protocol is not always fetched for each iteration of the filters. But, as demonstrated by Martin, TC actions could modify skb->protocol, for example act_vlan, this means we have to refetch skb protocol in each iteration, rather than using the one we fetch in the beginning of the loop. This bug is _not_ introduced by commit 3b3ae880266d ("net: sched: consolidate tc_classify{,_compat}"), technically, if act_vlan is the only action that modifies skb protocol, then it is commit c7e2b9689ef8 ("sched: introduce vlan action") which introduced this bug. Reported-by: Martin Olsson <martin.olsson+netdev@sentorsecurity.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-04net/sched: cls_api: add missing validation of netlink attributesDavide Caratti1-2/+5
[ Upstream commit e331473fee3d500bb0d2582a1fe598df3326d8cd ] Similarly to what has been done in 8b4c3cdd9dd8 ("net: sched: Add policy validation for tc attributes"), fix classifier code to add validation of TCA_CHAIN and TCA_KIND netlink attributes. tested with: # ./tdc.py -c filter v2: Let sch_api and cls_api share nla_policy they have in common, thanks to David Ahern. v3: Avoid EXPORT_SYMBOL(), as validation of those attributes is not done by TC modules, thanks to Cong Wang. While at it, restore the 'Delete / get qdisc' comment to its orginal position, just above tc_get_qdisc() function prototype. Fixes: 5bc1701881e39 ("net: sched: introduce multichain support for filters") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-19net: sched: fix error path in tcf_proto_create() when modules are not configuredJiri Pirko1-1/+1
[ Upstream commit d68d75fdc34b0253c2bded7ed18cd60eb5a9599b ] In case modules are not configured, error out when tp->ops is null and prevent later null pointer dereference. Fixes: 33a48927c193 ("sched: push TC filter protocol creation into a separate function") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-09net: sched: report if filter is too large to dumpRoman Kapl1-1/+6
[ Upstream commit 5ae437ad5a2ed573b1ebb04e0afa70b8869f88dd ] So far, if the filter was too large to fit in the allocated skb, the kernel did not return any error and stopped dumping. Modify the dumper so that it returns -EMSGSIZE when a filter fails to dump and it is the first filter in the skb. If we are not first, we will get a next chance with more room. I understand this is pretty near to being an API change, but the original design (silent truncation) can be considered a bug. Note: The error case can happen pretty easily if you create a filter with 32 actions and have 4kb pages. Also recent versions of iproute try to be clever with their buffer allocation size, which in turn leads to Signed-off-by: Roman Kapl <code@rkapl.cz> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-03net: sched: fix use-after-free in tcf_block_put_extJiri Pirko1-6/+7
commit df45bf84e4f5a48f23d4b1a07d21d566e8b587b2 upstream. Since the block is freed with last chain being put, once we reach the end of iteration of list_for_each_entry_safe, the block may be already freed. I'm hitting this only by creating and deleting clsact: [ 202.171952] ================================================================== [ 202.180182] BUG: KASAN: use-after-free in tcf_block_put_ext+0x240/0x390 [ 202.187590] Read of size 8 at addr ffff880225539a80 by task tc/796 [ 202.194508] [ 202.196185] CPU: 0 PID: 796 Comm: tc Not tainted 4.15.0-rc2jiri+ #5 [ 202.203200] Hardware name: Mellanox Technologies Ltd. "MSN2100-CB2F"/"SA001017", BIOS 5.6.5 06/07/2016 [ 202.213613] Call Trace: [ 202.216369] dump_stack+0xda/0x169 [ 202.220192] ? dma_virt_map_sg+0x147/0x147 [ 202.224790] ? show_regs_print_info+0x54/0x54 [ 202.229691] ? tcf_chain_destroy+0x1dc/0x250 [ 202.234494] print_address_description+0x83/0x3d0 [ 202.239781] ? tcf_block_put_ext+0x240/0x390 [ 202.244575] kasan_report+0x1ba/0x460 [ 202.248707] ? tcf_block_put_ext+0x240/0x390 [ 202.253518] tcf_block_put_ext+0x240/0x390 [ 202.258117] ? tcf_chain_flush+0x290/0x290 [ 202.262708] ? qdisc_hash_del+0x82/0x1a0 [ 202.267111] ? qdisc_hash_add+0x50/0x50 [ 202.271411] ? __lock_is_held+0x5f/0x1a0 [ 202.275843] clsact_destroy+0x3d/0x80 [sch_ingress] [ 202.281323] qdisc_destroy+0xcb/0x240 [ 202.285445] qdisc_graft+0x216/0x7b0 [ 202.289497] tc_get_qdisc+0x260/0x560 Fix this by holding the block also by chain 0 and put chain 0 explicitly, out of the list_for_each_entry_safe loop at the very end of tcf_block_put_ext. Fixes: efbf78973978 ("net_sched: get rid of rcu_barrier() in tcf_block_put_ext()") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-03net_sched: get rid of rcu_barrier() in tcf_block_put_ext()Cong Wang1-20/+9
commit efbf78973978b0d25af59bc26c8013a942af6e64 upstream. Both Eric and Paolo noticed the rcu_barrier() we use in tcf_block_put_ext() could be a performance bottleneck when we have a lot of tc classes. Paolo provided the following to demonstrate the issue: tc qdisc add dev lo root htb for I in `seq 1 1000`; do tc class add dev lo parent 1: classid 1:$I htb rate 100kbit tc qdisc add dev lo parent 1:$I handle $((I + 1)): htb for J in `seq 1 10`; do tc filter add dev lo parent $((I + 1)): u32 match ip src 1.1.1.$J done done time tc qdisc del dev root real 0m54.764s user 0m0.023s sys 0m0.000s The rcu_barrier() there is to ensure we free the block after all chains are gone, that is, to queue tcf_block_put_final() at the tail of workqueue. We can achieve this ordering requirement by refcnt'ing tcf block instead, that is, the tcf block is freed only when the last chain in this block is gone. This also simplifies the code. Paolo reported after this patch we get: real 0m0.017s user 0m0.000s sys 0m0.017s Tested-by: Paolo Abeni <pabeni@redhat.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Jiri Pirko <jiri@mellanox.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-03net: sched: crash on blocks with goto chain actionRoman Kapl1-5/+12
commit a60b3f515d30d0fe8537c64671926879a3548103 upstream. tcf_block_put_ext has assumed that all filters (and thus their goto actions) are destroyed in RCU callback and thus can not race with our list iteration. However, that is not true during netns cleanup (see tcf_exts_get_net comment). Prevent the user after free by holding all chains (except 0, that one is already held). foreach_safe is not enough in this case. To reproduce, run the following in a netns and then delete the ns: ip link add dtest type dummy tc qdisc add dev dtest ingress tc filter add dev dtest chain 1 parent ffff: handle 1 prio 1 flower action goto chain 2 Fixes: 822e86d997 ("net_sched: remove tcf_block_put_deferred()") Signed-off-by: Roman Kapl <code@rkapl.cz> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-03net: sched: fix crash when deleting secondary chainsRoman Kapl1-3/+4
commit d7aa04a5e82b4f254d306926c81eae8df69e5200 upstream. If you flush (delete) a filter chain other than chain 0 (such as when deleting the device), the kernel may run into a use-after-free. The chain refcount must not be decremented unless we are sure we are done with the chain. To reproduce the bug, run: ip link add dtest type dummy tc qdisc add dev dtest ingress tc filter add dev dtest chain 1 parent ffff: flower ip link del dtest Introduced in: commit f93e1cdcf42c ("net/sched: fix filter flushing"), but unless you have KAsan or luck, you won't notice it until commit 0dadc117ac8b ("cls_flower: use tcf_exts_get_net() before call_rcu()") Fixes: f93e1cdcf42c ("net/sched: fix filter flushing") Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Roman Kapl <code@rkapl.cz> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-09net_sched: introduce tcf_exts_get_net() and tcf_exts_put_net()Cong Wang1-0/+1
Instead of holding netns refcnt in tc actions, we can minimize the holding time by saving it in struct tcf_exts instead. This means we can just hold netns refcnt right before call_rcu() and release it after tcf_exts_destroy() is done. However, because on netns cleanup path we call tcf_proto_destroy() too, obviously we can not hold netns for a zero refcnt, in this case we have to do cleanup synchronously. It is fine for RCU too, the caller cleanup_net() already waits for a grace period. For other cases, refcnt is non-zero and we can safely grab it as normal and release it after we are done. This patch provides two new API for each filter to use: tcf_exts_get_net() and tcf_exts_put_net(). And all filters now can use the following pattern: void __destroy_filter() { tcf_exts_destroy(); tcf_exts_put_net(); // <== release netns refcnt kfree(); } void some_work() { rtnl_lock(); __destroy_filter(); rtnl_unlock(); } void some_rcu_callback() { tcf_queue_work(some_work); } if (tcf_exts_get_net()) // <== hold netns refcnt call_rcu(some_rcu_callback); else __destroy_filter(); Cc: Lucas Bates <lucasb@mojatatu.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-31net_sched: remove tcf_block_put_deferred()Cong Wang1-29/+8
In commit 7aa0045dadb6 ("net_sched: introduce a workqueue for RCU callbacks of tc filter") I defer tcf_chain_flush() to a workqueue, this causes a use-after-free because qdisc is already destroyed after we queue this work. The tcf_block_put_deferred() is no longer necessary after we get RTNL for each tc filter destroy work, no others could jump in at this point. Same for tcf_chain_hold(), we are fully serialized now. This also reduces one indirection therefore makes the code more readable. Note this brings back a rcu_barrier(), however comparing to the code prior to commit 7aa0045dadb6 we still reduced one rcu_barrier(). For net-next, we can consider to refcnt tcf block to avoid it. Fixes: 7aa0045dadb6 ("net_sched: introduce a workqueue for RCU callbacks of tc filter") Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Pirko <jiri@resnulli.us> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29net_sched: add rtnl assertion to tcf_exts_destroy()Cong Wang1-0/+1
After previous patches, it is now safe to claim that tcf_exts_destroy() is always called with RTNL lock. Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Pirko <jiri@resnulli.us> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29net_sched: introduce a workqueue for RCU callbacks of tc filterCong Wang1-17/+51
This patch introduces a dedicated workqueue for tc filters so that each tc filter's RCU callback could defer their action destroy work to this workqueue. The helper tcf_queue_work() is introduced for them to use. Because we hold RTNL lock when calling tcf_block_put(), we can not simply flush works inside it, therefore we have to defer it again to this workqueue and make sure all flying RCU callbacks have already queued their work before this one, in other words, to ensure this is the last one to execute to prevent any use-after-free. On the other hand, this makes tcf_block_put() ugly and harder to understand. Since David and Eric strongly dislike adding synchronize_rcu(), this is probably the only solution that could make everyone happy. Please also see the code comments below. Reported-by: Chris Mi <chrism@mellanox.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Pirko <jiri@resnulli.us> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-13net_sched: carefully handle tcf_block_put()Cong Wang1-6/+18
As pointed out by Jiri, there is still a race condition between tcf_block_put() and tcf_chain_destroy() in a RCU callback. There is no way to make it correct without proper locking or synchronization, because both operate on a shared list. Locking is hard, because the only lock we can pick here is a spinlock, however, in tc_dump_tfilter() we iterate this list with a sleeping function called (tcf_chain_dump()), which makes using a lock to protect chain_list almost impossible. Jiri suggested the idea of holding a refcnt before flushing, this works because it guarantees us there would be no parallel tcf_chain_destroy() during the loop, therefore the race condition is gone. But we have to be very careful with proper synchronization with RCU callbacks. Suggested-by: Jiri Pirko <jiri@mellanox.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-13net_sched: fix reference counting of tc filter chainCong Wang1-21/+26
This patch fixes the following ugliness of tc filter chain refcnt: a) tp proto should hold a refcnt to the chain too. This significantly simplifies the logic. b) Chain 0 is no longer special, it is created with refcnt=1 like any other chains. All the ugliness in tcf_chain_put() can be gone! c) No need to handle the flushing oddly, because block still holds chain 0, it can not be released, this guarantees block is the last user. d) The race condition with RCU callbacks is easier to handle with just a rcu_barrier(). Much easier to understand, nothing to hide. Thanks to the previous patch. Please see also the comments in code. e) Make the code understandable by humans, much less error-prone. Fixes: 744a4cf63e52 ("net: sched: fix use after free when tcf_chain_destroy is called multiple times") Fixes: 5bc1701881e3 ("net: sched: introduce multichain support for filters") Cc: Jiri Pirko <jiri@mellanox.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-08net: sched: fix memleak for chain zeroJiri Pirko1-9/+9
There's a memleak happening for chain 0. The thing is, chain 0 needs to be always present, not created on demand. Therefore tcf_block_get upon creation of block calls the tcf_chain_create function directly. The chain is created with refcnt == 1, which is not correct in this case and causes the memleak. So move the refcnt increment into tcf_chain_get function even for the case when chain needs to be created. Reported-by: Jakub Kicinski <kubakici@wp.pl> Fixes: 5bc1701881e3 ("net: sched: introduce multichain support for filters") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Tested-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-4/+12
Three cases of simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-26net_sched: remove tc class reference countingWANG Cong1-11/+6
For TC classes, their ->get() and ->put() are always paired, and the reference counting is completely useless, because: 1) For class modification and dumping paths, we already hold RTNL lock, so all of these ->get(),->change(),->put() are atomic. 2) For filter bindiing/unbinding, we use other reference counter than this one, and they should have RTNL lock too. 3) For ->qlen_notify(), it is special because it is called on ->enqueue() path, but we already hold qdisc tree lock there, and we hold this tree lock when graft or delete the class too, so it should not be gone or changed until we release the tree lock. Therefore, this patch removes ->get() and ->put(), but: 1) Adds a new ->find() to find the pointer to a class by classid, no refcnt. 2) Move the original class destroy upon the last refcnt into ->delete(), right after releasing tree lock. This is fine because the class is already removed from hash when holding the lock. For those who also use ->put() as ->unbind(), just rename them to reflect this change. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23net: sched: don't do tcf_chain_flush from tcf_chain_destroyJiri Pirko1-3/+3
tcf_chain_flush needs to be called with RTNL. However, on free_tcf-> tcf_action_goto_chain_fini-> tcf_chain_put-> tcf_chain_destroy-> tcf_chain_flush callpath, it is called without RTNL. This issue was notified by following warning: [ 155.599052] WARNING: suspicious RCU usage [ 155.603165] 4.13.0-rc5jiri+ #54 Not tainted [ 155.607456] ----------------------------- [ 155.611561] net/sched/cls_api.c:195 suspicious rcu_dereference_protected() usage! Since on this callpath, the chain is guaranteed to be already empty by check in tcf_chain_put, move the tcf_chain_flush call out and call it only where it is needed - into tcf_block_put. Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23net: sched: fix use after free when tcf_chain_destroy is called multiple timesJiri Pirko1-2/+10
The goto_chain termination action takes a reference of a chain. In that case, there is an issue when block_put is called tcf_chain_destroy directly. The follo-up call of tcf_chain_put by goto_chain action free works with memory that is already freed. This was caught by kasan: [ 220.337908] BUG: KASAN: use-after-free in tcf_chain_put+0x1b/0x50 [ 220.344103] Read of size 4 at addr ffff88036d1f2cec by task systemd-journal/261 [ 220.353047] CPU: 0 PID: 261 Comm: systemd-journal Not tainted 4.13.0-rc5jiri+ #54 [ 220.360661] Hardware name: Mellanox Technologies Ltd. Mellanox switch/Mellanox x86 mezzanine board, BIOS 4.6.5 08/02/2016 [ 220.371784] Call Trace: [ 220.374290] <IRQ> [ 220.376355] dump_stack+0xd5/0x150 [ 220.391485] print_address_description+0x86/0x410 [ 220.396308] kasan_report+0x181/0x4c0 [ 220.415211] tcf_chain_put+0x1b/0x50 [ 220.418949] free_tcf+0x95/0xc0 So allow tcf_chain_destroy to be called multiple times, free only in case the reference count drops to 0. Fixes: 5bc1701881e3 ("net: sched: introduce multichain support for filters") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-1/+1
2017-08-18net: sched: fix p_filter_chain check in tcf_chain_flushJiri Pirko1-1/+1
The dereference before check is wrong and leads to an oops when p_filter_chain is NULL. The check needs to be done on the pointer to prevent NULL dereference. Fixes: f93e1cdcf42c ("net/sched: fix filter flushing") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-10rtnetlink: make rtnl_register accept a flags parameterFlorian Westphal1-3/+3
This change allows us to later indicate to rtnetlink core that certain doit functions should be called without acquiring rtnl_mutex. This change should have no effect, we simply replace the last (now unused) calcit argument with the new flag. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09net_sched: get rid of some forward declarationsWANG Cong1-111/+103
If we move up tcf_fill_node() we can get rid of these forward declarations. Also, move down tfilter_notify_chain() to group them together. Reported-by: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-08net_sched: use void pointer for filter handleWANG Cong1-9/+8
Now we use 'unsigned long fh' as a pointer in every place, it is safe to convert it to a void pointer now. This gets rid of many casts to pointer. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-08net_sched: refactor notification code for RTM_DELTFILTERWANG Cong1-5/+39
It is confusing to use 'unsigned long fh' as both a handle and a pointer, especially commit 9ee7837449b3 ("net sched filters: fix notification of filter delete with proper handle"). This patch introduces tfilter_del_notify() so that we can pass it as a pointer as before, and we don't need to check RTM_DELTFILTER in tcf_fill_node() any more. This prepares for the next patch. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-04net: sched: avoid atomic swap in tcf_exts_changeJiri Pirko1-8/+2
tcf_exts_change is always called on newly created exts, which are not used on fastpath. Therefore, simple struct copy is enough. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-04net: sched: use tcf_exts_has_actions instead of exts->nr_actionsJiri Pirko1-1/+1
For check in tcf_exts_dump use tcf_exts_has_actions helper instead of exts->nr_actions for checking if there are any actions present. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-04net: sched: change names of action number helpers to be aligned with the restJiri Pirko1-1/+1
The rest of the helpers are named tcf_exts_*, so change the name of the action number helpers to be aligned. While at it, change to inline functions. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-25net_sched: only create filter chains for new filters/actionsWANG Cong1-4/+9
tcf_chain_get() always creates a new filter chain if not found in existing ones. This is totally unnecessary when we get or delete filters, new chain should be only created for new filters (or new actions). Fixes: 5bc1701881e3 ("net: sched: introduce multichain support for filters") Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-25net: sched: cls_api: make reclassify return all the way back to the original tpJiri Pirko1-3/+5
With the introduction of chain goto action, the reclassification would cause the re-iteration of the actual chain. It makes more sense to restart the whole thing and re-iterate starting from the original tp - start of chain 0. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-23net/sched: fix filter flushingJiri Pirko1-3/+10
When user instructs to remove all filters from chain, we cannot destroy the chain as other actions may hold a reference. Also the put in errout would try to destroy it again. So instead, just walk the chain and remove all existing filters. Fixes: 5bc1701881e3 ("net: sched: introduce multichain support for filters") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-23net/sched: properly assign RCU pointer in tcf_chain_tp_insert/removeJiri Pirko1-2/+2
*p_filter_chain is rcu-dereferenced on reader path. So here in writer, property assign the pointer. Fixes: 2190d1d0944f ("net: sched: introduce helpers to work with filter chains") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-17net: sched: add termination action to allow goto chainJiri Pirko1-1/+5
Introduce new type of termination action called "goto_chain". This allows user to specify a chain to be processed. This action type is then processed as a return value in tcf_classify loop in similar way as "reclassify" is, only it does not reset to the first filter in chain but rather reset to the first filter of the desired chain. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-17net: sched: push tp down to action initJiri Pirko1-4/+5
Tp pointer will be needed by the next patch in order to get the chain. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-17net: sched: introduce multichain support for filtersJiri Pirko1-16/+88
Instead of having only one filter per block, introduce a list of chains for every block. Create chain 0 by default. UAPI is extended so the user can specify which chain he wants to change. If the new attribute is not specified, chain 0 is used. That allows to maintain backward compatibility. If chain does not exist and user wants to manipulate with it, new chain is created with specified index. Also, when last filter is removed from the chain, the chain is destroyed. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-17net: sched: push chain dump to a separate functionJiri Pirko1-43/+52
Since there will be multiple chains to dump, push chain dumping code to a separate function. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-17net: sched: introduce helpers to work with filter chainsJiri Pirko1-41/+107
Introduce struct tcf_chain object and set of helpers around it. Wraps up insertion, deletion and search in the filter chain. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-17net: sched: move TC_H_MAJ macro call into tcf_auto_prioJiri Pirko1-2/+2
Call the helper from the function rather than to always adjust the return value of the function. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-17net: sched: replace nprio by a bool to make the function more readableJiri Pirko1-6/+7
The use of "nprio" variable in tc_ctl_tfilter is a bit cryptic and makes a reader wonder what is going on for a while. So help him to understand this priority allocation dance a litte bit better. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-17net: sched: rename tcf_destroy_chain helperJiri Pirko1-3/+3
Make the name consistent with the rest of the helpers around. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-17net: sched: introduce tcf block infractructureJiri Pirko1-10/+38
Currently, the filter chains are direcly put into the private structures of qdiscs. In order to be able to have multiple chains per qdisc and to allow filter chains sharing among qdiscs, there is a need for common object that would hold the chains. This introduces such object and calls it "tcf_block". Helpers to get and put the blocks are provided to be called from individual qdisc code. Also, the original filter_list pointers are left in qdisc privs to allow the entry into tcf_block processing without any added overhead of possible multiple pointer dereference on fast path. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-17net: sched: move tc_classify function to cls_api.cJiri Pirko1-0/+48
Move tc_classify function to cls_api.c where it belongs, rename it to fit the namespace. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21net_sched: move the empty tp check from ->destroy() to ->delete()WANG Cong1-13/+14
We could have a race condition where in ->classify() path we dereference tp->root and meanwhile a parallel ->destroy() makes it a NULL. Daniel cured this bug in commit d936377414fa ("net, sched: respect rcu grace period on cls destruction"). This happens when ->destroy() is called for deleting a filter to check if we are the last one in tp, this tp is still linked and visible at that time. The root cause of this problem is the semantic of ->destroy(), it does two things (for non-force case): 1) check if tp is empty 2) if tp is empty we could really destroy it and its caller, if cares, needs to check its return value to see if it is really destroyed. Therefore we can't unlink tp unless we know it is empty. As suggested by Daniel, we could actually move the test logic to ->delete() so that we can safely unlink tp after ->delete() tells us the last one is just deleted and before ->destroy(). Fixes: 1e052be69d04 ("net_sched: destroy proto tp when all filters are gone") Cc: Roi Dayan <roid@mellanox.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17net: rtnetlink: plumb extended ack to doit functionDavid Ahern1-2/+3
Add netlink_ext_ack arg to rtnl_doit_func. Pass extack arg to nlmsg_parse for doit functions that call it directly. This is the first step to using extended error reporting in rtnetlink. >From here individual subsystems can be updated to set netlink_ext_ack as needed. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-13netlink: pass extended ACK struct to parsing functionsJohannes Berg1-1/+1
Pass the new extended ACK reporting struct to all of the generic netlink parsing functions. For now, pass NULL in almost all callers (except for some in the core.) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-14sched: Fix accidental removal of errout gotoJiri Pirko1-0/+1
Bring back the goto that was removed by accident. Reported-by: Colin Ian King <colin.king@canonical.com> Fixes: 40c81b25b16c ("sched: check negative err value to safe one level of indent") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10sched: check negative err value to safe one level of indentJiri Pirko1-13/+9
As it is more common, check err for !0. That allows to safe one level of indentation and makes the code easier to read. Also, make 'next' variable global in function as it is used twice. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10sched: add missing curly braces in else branch in tc_ctl_tfilterJiri Pirko1-1/+2
Curly braces need to be there, for stylistic reasons. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10sched: move err set right before goto errout in tc_ctl_tfilterJiri Pirko1-10/+19
This makes the reader to know right away what is the error value. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10sched: push TC filter protocol creation into a separate functionJiri Pirko1-51/+59
Make the long function tc_ctl_tfilter a little bit shorter and easier to read. Also make the creation of filter proto symmetric to destruction. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>