summaryrefslogtreecommitdiff
path: root/net/sched
AgeCommit message (Collapse)AuthorFilesLines
2019-12-05net: sched: fix `tc -s class show` no bstats on class with nolock subqueuesDust Li4-5/+6
[ Upstream commit 14e54ab9143fa60794d13ea0a66c792a2046a8f3 ] When a classful qdisc's child qdisc has set the flag TCQ_F_CPUSTATS (pfifo_fast for example), the child qdisc's cpu_bstats should be passed to gnet_stats_copy_basic(), but many classful qdisc didn't do that. As a result, `tc -s class show dev DEV` always return 0 for bytes and packets in this case. Pass the child qdisc's cpu_bstats to gnet_stats_copy_basic() to fix this issue. The qstats also has this problem, but it has been fixed in 5dd431b6b9 ("net: sched: introduce and use qstats read...") and bstats still remains buggy. Fixes: 22e0f8b9322c ("net: sched: make bstats per cpu and estimator RCU safe") Signed-off-by: Dust Li <dust.li@linux.alibaba.com> Signed-off-by: Tony Lu <tonylu@linux.alibaba.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-29taprio: don't reject same mqprio settingsIvan Khoronzhuk1-2/+26
[ Upstream commit b5a0faa3572ac70bd374bd66190ac3ad4fddab20 ] The taprio qdisc allows to set mqprio setting but only once. In case if mqprio settings are provided next time the error is returned as it's not allowed to change traffic class mapping in-flignt and that is normal. But if configuration is absolutely the same - no need to return error. It allows to provide same command couple times, changing only base time for instance, or changing only scheds maps, but leaving mqprio setting w/o modification. It more corresponds the message: "Changing the traffic mapping of a running schedule is not supported", so reject mqprio if it's really changed. Also corrected TC_BITMASK + 1 for consistency, as proposed. Fixes: a3d43c0d56f1 ("taprio: Add support adding an admin schedule") Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Tested-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-29net: sched: ensure opts_len <= IP_TUNNEL_OPTS_MAX in act_tunnel_keyXin Long1-0/+4
[ Upstream commit 4f0e97d070984d487df027f163e52bb72d1713d8 ] info->options_len is 'u8' type, and when opts_len with a value > IP_TUNNEL_OPTS_MAX, 'info->options_len = opts_len' will cast int to u8 and set a wrong value to info->options_len. Kernel crashed in my test when doing: # opts="0102:80:00800022" # for i in {1..99}; do opts="$opts,0102:80:00800022"; done # ip link add name geneve0 type geneve dstport 0 external # tc qdisc add dev eth0 ingress # tc filter add dev eth0 protocol ip parent ffff: \ flower indev eth0 ip_proto udp action tunnel_key \ set src_ip 10.0.99.192 dst_ip 10.0.99.193 \ dst_port 6081 id 11 geneve_opts $opts \ action mirred egress redirect dev geneve0 So we should do the similar check as cls_flower does, return error when opts_len > IP_TUNNEL_OPTS_MAX in tunnel_key_copy_opts(). Fixes: 0ed5269f9e41 ("net/sched: add tunnel option support to act_tunnel_key") Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-29net/sched: act_pedit: fix WARN() in the traffic pathDavide Caratti1-7/+5
[ Upstream commit f67169fef8dbcc1ac6a6a109ecaad0d3b259002c ] when configuring act_pedit rules, the number of keys is validated only on addition of a new entry. This is not sufficient to avoid hitting a WARN() in the traffic path: for example, it is possible to replace a valid entry with a new one having 0 extended keys, thus causing splats in dmesg like: pedit BUG: index 42 WARNING: CPU: 2 PID: 4054 at net/sched/act_pedit.c:410 tcf_pedit_act+0xc84/0x1200 [act_pedit] [...] RIP: 0010:tcf_pedit_act+0xc84/0x1200 [act_pedit] Code: 89 fa 48 c1 ea 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e ac 00 00 00 48 8b 44 24 10 48 c7 c7 a0 c4 e4 c0 8b 70 18 e8 1c 30 95 ea <0f> 0b e9 a0 fa ff ff e8 00 03 f5 ea e9 14 f4 ff ff 48 89 58 40 e9 RSP: 0018:ffff888077c9f320 EFLAGS: 00010286 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffac2983a2 RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff888053927bec RBP: dffffc0000000000 R08: ffffed100a726209 R09: ffffed100a726209 R10: 0000000000000001 R11: ffffed100a726208 R12: ffff88804beea780 R13: ffff888079a77400 R14: ffff88804beea780 R15: ffff888027ab2000 FS: 00007fdeec9bd740(0000) GS:ffff888053900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffdb3dfd000 CR3: 000000004adb4006 CR4: 00000000001606e0 Call Trace: tcf_action_exec+0x105/0x3f0 tcf_classify+0xf2/0x410 __dev_queue_xmit+0xcbf/0x2ae0 ip_finish_output2+0x711/0x1fb0 ip_output+0x1bf/0x4b0 ip_send_skb+0x37/0xa0 raw_sendmsg+0x180c/0x2430 sock_sendmsg+0xdb/0x110 __sys_sendto+0x257/0x2b0 __x64_sys_sendto+0xdd/0x1b0 do_syscall_64+0xa5/0x4e0 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7fdeeb72e993 Code: 48 8b 0d e0 74 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 0d d6 2c 00 00 75 13 49 89 ca b8 2c 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8 4b cc 00 00 48 89 04 24 RSP: 002b:00007ffdb3de8a18 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 000055c81972b700 RCX: 00007fdeeb72e993 RDX: 0000000000000040 RSI: 000055c81972b700 RDI: 0000000000000003 RBP: 00007ffdb3dea130 R08: 000055c819728510 R09: 0000000000000010 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040 R13: 000055c81972b6c0 R14: 000055c81972969c R15: 0000000000000080 Fix this moving the check on 'nkeys' earlier in tcf_pedit_init(), so that attempts to install rules having 0 keys are always rejected with -EINVAL. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-12net: sched: prevent duplicate flower rules from tcf_proto destroy raceJohn Hurley1-4/+79
[ Upstream commit 59eb87cb52c9f7164804bc8639c4d03ba9b0c169 ] When a new filter is added to cls_api, the function tcf_chain_tp_insert_unique() looks up the protocol/priority/chain to determine if the tcf_proto is duplicated in the chain's hashtable. It then creates a new entry or continues with an existing one. In cls_flower, this allows the function fl_ht_insert_unque to determine if a filter is a duplicate and reject appropriately, meaning that the duplicate will not be passed to drivers via the offload hooks. However, when a tcf_proto is destroyed it is removed from its chain before a hardware remove hook is hit. This can lead to a race whereby the driver has not received the remove message but duplicate flows can be accepted. This, in turn, can lead to the offload driver receiving incorrect duplicate flows and out of order add/delete messages. Prevent duplicates by utilising an approach suggested by Vlad Buslov. A hash table per block stores each unique chain/protocol/prio being destroyed. This entry is only removed when the full destroy (and hardware offload) has completed. If a new flow is being added with the same identiers as a tc_proto being detroyed, then the add request is replayed until the destroy is complete. Fixes: 8b64678e0af8 ("net: sched: refactor tp insert/delete for concurrent execution") Signed-off-by: John Hurley <john.hurley@netronome.com> Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reported-by: Louis Peens <louis.peens@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-10net/flow_dissector: switch to siphashEric Dumazet3-16/+19
commit 55667441c84fa5e0911a0aac44fb059c15ba6da2 upstream. UDP IPv6 packets auto flowlabels are using a 32bit secret (static u32 hashrnd in net/core/flow_dissector.c) and apply jhash() over fields known by the receivers. Attackers can easily infer the 32bit secret and use this information to identify a device and/or user, since this 32bit secret is only set at boot time. Really, using jhash() to generate cookies sent on the wire is a serious security concern. Trying to change the rol32(hash, 16) in ip6_make_flowlabel() would be a dead end. Trying to periodically change the secret (like in sch_sfq.c) could change paths taken in the network for long lived flows. Let's switch to siphash, as we did in commit df453700e8d8 ("inet: switch IP ID generator to siphash") Using a cryptographically strong pseudo random function will solve this privacy issue and more generally remove other weak points in the stack. Packet schedulers using skb_get_hash_perturb() benefit from this change. Fixes: b56774163f99 ("ipv6: Enable auto flow labels by default") Fixes: 42240901f7c4 ("ipv6: Implement different admin modes for automatic flow labels") Fixes: 67800f9b1f4e ("ipv6: Call skb_get_hash_flowi6 to get skb->hash in ip6_make_flowlabel") Fixes: cb1ce2ef387b ("ipv6: Implement automatic flow label generation on transmit") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Jonathan Berger <jonathann1@walla.com> Reported-by: Amit Klein <aksecurity@gmail.com> Reported-by: Benny Pinkas <benny@pinkas.net> Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-10net: netem: correct the parent's backlog when corrupted packet was droppedJakub Kicinski1-0/+2
[ Upstream commit e0ad032e144731a5928f2d75e91c2064ba1a764c ] If packet corruption failed we jump to finish_segs and return NET_XMIT_SUCCESS. Seeing success will make the parent qdisc increment its backlog, that's incorrect - we need to return NET_XMIT_DROP. Fixes: 6071bd1aa13e ("netem: Segment GSO packets on enqueue") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-10net: netem: fix error path for corrupted GSO framesJakub Kicinski1-3/+6
[ Upstream commit a7fa12d15855904aff1716e1fc723c03ba38c5cc ] To corrupt a GSO frame we first perform segmentation. We then proceed using the first segment instead of the full GSO skb and requeue the rest of the segments as separate packets. If there are any issues with processing the first segment we still want to process the rest, therefore we jump to the finish_segs label. Commit 177b8007463c ("net: netem: fix backlog accounting for corrupted GSO frames") started using the pointer to the first segment in the "rest of segments processing", but as mentioned above the first segment may had already been freed at this point. Backlog corrections for parent qdiscs have to be adjusted. Fixes: 177b8007463c ("net: netem: fix backlog accounting for corrupted GSO frames") Reported-by: kbuild test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reported-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-06net: sched: sch_sfb: don't call qdisc_put() while holding tree lockVlad Buslov1-3/+4
commit e3ae1f96accd21405715fe9c56b4d83bc7d96d44 upstream. Recent changes that removed rtnl dependency from rules update path of tc also made tcf_block_put() function sleeping. This function is called from ops->destroy() of several Qdisc implementations, which in turn is called by qdisc_put(). Some Qdiscs call qdisc_put() while holding sch tree spinlock, which results sleeping-while-atomic BUG. Steps to reproduce for sfb: tc qdisc add dev ens1f0 handle 1: root sfb tc qdisc add dev ens1f0 parent 1:10 handle 50: sfq perturb 10 tc qdisc change dev ens1f0 root handle 1: sfb Resulting dmesg: [ 7265.938717] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:909 [ 7265.940152] in_atomic(): 1, irqs_disabled(): 0, pid: 28579, name: tc [ 7265.941455] INFO: lockdep is turned off. [ 7265.942744] CPU: 11 PID: 28579 Comm: tc Tainted: G W 5.3.0-rc8+ #721 [ 7265.944065] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017 [ 7265.945396] Call Trace: [ 7265.946709] dump_stack+0x85/0xc0 [ 7265.947994] ___might_sleep.cold+0xac/0xbc [ 7265.949282] __mutex_lock+0x5b/0x960 [ 7265.950543] ? tcf_chain0_head_change_cb_del.isra.0+0x1b/0xf0 [ 7265.951803] ? tcf_chain0_head_change_cb_del.isra.0+0x1b/0xf0 [ 7265.953022] tcf_chain0_head_change_cb_del.isra.0+0x1b/0xf0 [ 7265.954248] tcf_block_put_ext.part.0+0x21/0x50 [ 7265.955478] tcf_block_put+0x50/0x70 [ 7265.956694] sfq_destroy+0x15/0x50 [sch_sfq] [ 7265.957898] qdisc_destroy+0x5f/0x160 [ 7265.959099] sfb_change+0x175/0x330 [sch_sfb] [ 7265.960304] tc_modify_qdisc+0x324/0x840 [ 7265.961503] rtnetlink_rcv_msg+0x170/0x4b0 [ 7265.962692] ? netlink_deliver_tap+0x95/0x400 [ 7265.963876] ? rtnl_dellink+0x2d0/0x2d0 [ 7265.965064] netlink_rcv_skb+0x49/0x110 [ 7265.966251] netlink_unicast+0x171/0x200 [ 7265.967427] netlink_sendmsg+0x224/0x3f0 [ 7265.968595] sock_sendmsg+0x5e/0x60 [ 7265.969753] ___sys_sendmsg+0x2ae/0x330 [ 7265.970916] ? ___sys_recvmsg+0x159/0x1f0 [ 7265.972074] ? do_wp_page+0x9c/0x790 [ 7265.973233] ? __handle_mm_fault+0xcd3/0x19e0 [ 7265.974407] __sys_sendmsg+0x59/0xa0 [ 7265.975591] do_syscall_64+0x5c/0xb0 [ 7265.976753] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 7265.977938] RIP: 0033:0x7f229069f7b8 [ 7265.979117] Code: 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 65 8f 0c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 89 5 4 [ 7265.981681] RSP: 002b:00007ffd7ed2d158 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 7265.983001] RAX: ffffffffffffffda RBX: 000000005d813ca1 RCX: 00007f229069f7b8 [ 7265.984336] RDX: 0000000000000000 RSI: 00007ffd7ed2d1c0 RDI: 0000000000000003 [ 7265.985682] RBP: 0000000000000000 R08: 0000000000000001 R09: 000000000165c9a0 [ 7265.987021] R10: 0000000000404eda R11: 0000000000000246 R12: 0000000000000001 [ 7265.988309] R13: 000000000047f640 R14: 0000000000000000 R15: 0000000000000000 In sfb_change() function use qdisc_purge_queue() instead of qdisc_tree_flush_backlog() to properly reset old child Qdisc and save pointer to it into local temporary variable. Put reference to Qdisc after sch tree lock is released in order not to call potentially sleeping cls API in atomic section. This is safe to do because Qdisc has already been reset by qdisc_purge_queue() inside sch tree lock critical section. Reported-by: syzbot+ac54455281db908c581e@syzkaller.appspotmail.com Fixes: c266f64dbfa2 ("net: sched: protect block state with mutex") Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-06sch_netem: fix rcu splat in netem_enqueue()Eric Dumazet1-1/+1
commit 159d2c7d8106177bd9a986fd005a311fe0d11285 upstream. qdisc_root() use from netem_enqueue() triggers a lockdep warning. __dev_queue_xmit() uses rcu_read_lock_bh() which is not equivalent to rcu_read_lock() + local_bh_disable_bh as far as lockdep is concerned. WARNING: suspicious RCU usage 5.3.0-rc7+ #0 Not tainted ----------------------------- include/net/sch_generic.h:492 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 3 locks held by syz-executor427/8855: #0: 00000000b5525c01 (rcu_read_lock_bh){....}, at: lwtunnel_xmit_redirect include/net/lwtunnel.h:92 [inline] #0: 00000000b5525c01 (rcu_read_lock_bh){....}, at: ip_finish_output2+0x2dc/0x2570 net/ipv4/ip_output.c:214 #1: 00000000b5525c01 (rcu_read_lock_bh){....}, at: __dev_queue_xmit+0x20a/0x3650 net/core/dev.c:3804 #2: 00000000364bae92 (&(&sch->q.lock)->rlock){+.-.}, at: spin_lock include/linux/spinlock.h:338 [inline] #2: 00000000364bae92 (&(&sch->q.lock)->rlock){+.-.}, at: __dev_xmit_skb net/core/dev.c:3502 [inline] #2: 00000000364bae92 (&(&sch->q.lock)->rlock){+.-.}, at: __dev_queue_xmit+0x14b8/0x3650 net/core/dev.c:3838 stack backtrace: CPU: 0 PID: 8855 Comm: syz-executor427 Not tainted 5.3.0-rc7+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 lockdep_rcu_suspicious+0x153/0x15d kernel/locking/lockdep.c:5357 qdisc_root include/net/sch_generic.h:492 [inline] netem_enqueue+0x1cfb/0x2d80 net/sched/sch_netem.c:479 __dev_xmit_skb net/core/dev.c:3527 [inline] __dev_queue_xmit+0x15d2/0x3650 net/core/dev.c:3838 dev_queue_xmit+0x18/0x20 net/core/dev.c:3902 neigh_hh_output include/net/neighbour.h:500 [inline] neigh_output include/net/neighbour.h:509 [inline] ip_finish_output2+0x1726/0x2570 net/ipv4/ip_output.c:228 __ip_finish_output net/ipv4/ip_output.c:308 [inline] __ip_finish_output+0x5fc/0xb90 net/ipv4/ip_output.c:290 ip_finish_output+0x38/0x1f0 net/ipv4/ip_output.c:318 NF_HOOK_COND include/linux/netfilter.h:294 [inline] ip_mc_output+0x292/0xf40 net/ipv4/ip_output.c:417 dst_output include/net/dst.h:436 [inline] ip_local_out+0xbb/0x190 net/ipv4/ip_output.c:125 ip_send_skb+0x42/0xf0 net/ipv4/ip_output.c:1555 udp_send_skb.isra.0+0x6b2/0x1160 net/ipv4/udp.c:887 udp_sendmsg+0x1e96/0x2820 net/ipv4/udp.c:1174 inet_sendmsg+0x9e/0xe0 net/ipv4/af_inet.c:807 sock_sendmsg_nosec net/socket.c:637 [inline] sock_sendmsg+0xd7/0x130 net/socket.c:657 ___sys_sendmsg+0x3e2/0x920 net/socket.c:2311 __sys_sendmmsg+0x1bf/0x4d0 net/socket.c:2413 __do_sys_sendmmsg net/socket.c:2442 [inline] __se_sys_sendmmsg net/socket.c:2439 [inline] __x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2439 do_syscall_64+0xfd/0x6a0 arch/x86/entry/common.c:296 entry_SYSCALL_64_after_hwframe+0x49/0xbe Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-29net/sched: fix corrupted L2 header with MPLS 'push' and 'pop' actionsDavide Caratti1-4/+8
[ Upstream commit fa4e0f8855fcba600e0be2575ee29c69166f74bd ] the following script: # tc qdisc add dev eth0 clsact # tc filter add dev eth0 egress protocol ip matchall \ > action mpls push protocol mpls_uc label 0x355aa bos 1 causes corruption of all IP packets transmitted by eth0. On TC egress, we can't rely on the value of skb->mac_len, because it's 0 and a MPLS 'push' operation will result in an overwrite of the first 4 octets in the packet L2 header (e.g. the Destination Address if eth0 is an Ethernet); the same error pattern is present also in the MPLS 'pop' operation. Fix this error in act_mpls data plane, computing 'mac_len' as the difference between the network header and the mac header (when not at TC ingress), and use it in MPLS 'push'/'pop' core functions. v2: unbreak 'make htmldocs' because of missing documentation of 'mac_len' in skb_mpls_pop(), reported by kbuild test robot CC: Lorenzo Bianconi <lorenzo@kernel.org> Fixes: 2a2ea50870ba ("net: sched: add mpls manipulation actions to TC") Reviewed-by: Simon Horman <simon.horman@netronome.com> Acked-by: John Hurley <john.hurley@netronome.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-29sched: etf: Fix ordering of packets with same txtimeVinicius Costa Gomes1-1/+1
[ Upstream commit 28aa7c86c2b49f659c8460a89e53b506c45979bb ] When a application sends many packets with the same txtime, they may be transmitted out of order (different from the order in which they were enqueued). This happens because when inserting elements into the tree, when the txtime of two packets are the same, the new packet is inserted at the left side of the tree, causing the reordering. The only effect of this change should be that packets with the same txtime will be transmitted in the order they are enqueued. The application in question (the AVTP GStreamer plugin, still in development) is sending video traffic, in which each video frame have a single presentation time, the problem is that when packetizing, multiple packets end up with the same txtime. The receiving side was rejecting packets because they were being received out of order. Fixes: 25db26a91364 ("net/sched: Introduce the ETF Qdisc") Reported-by: Ederson de Souza <ederson.desouza@intel.com> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-29net: avoid potential infinite loop in tc_ctl_action()Eric Dumazet1-6/+8
[ Upstream commit 39f13ea2f61b439ebe0060393e9c39925c9ee28c ] tc_ctl_action() has the ability to loop forever if tcf_action_add() returns -EAGAIN. This special case has been done in case a module needed to be loaded, but it turns out that tcf_add_notify() could also return -EAGAIN if the socket sk_rcvbuf limit is hit. We need to separate the two cases, and only loop for the module loading case. While we are at it, add a limit of 10 attempts since unbounded loops are always scary. syzbot repro was something like : socket(PF_NETLINK, SOCK_RAW|SOCK_NONBLOCK, NETLINK_ROUTE) = 3 write(3, ..., 38) = 38 setsockopt(3, SOL_SOCKET, SO_RCVBUF, [0], 4) = 0 sendmsg(3, {msg_name(0)=NULL, msg_iov(1)=[{..., 388}], msg_controllen=0, msg_flags=0x10}, ...) NMI backtrace for cpu 0 CPU: 0 PID: 1054 Comm: khungtaskd Not tainted 5.4.0-rc1+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 nmi_cpu_backtrace.cold+0x70/0xb2 lib/nmi_backtrace.c:101 nmi_trigger_cpumask_backtrace+0x23b/0x28b lib/nmi_backtrace.c:62 arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38 trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline] check_hung_uninterruptible_tasks kernel/hung_task.c:205 [inline] watchdog+0x9d0/0xef0 kernel/hung_task.c:289 kthread+0x361/0x430 kernel/kthread.c:255 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352 Sending NMI from CPU 0 to CPUs 1: NMI backtrace for cpu 1 CPU: 1 PID: 8859 Comm: syz-executor910 Not tainted 5.4.0-rc1+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:arch_local_save_flags arch/x86/include/asm/paravirt.h:751 [inline] RIP: 0010:lockdep_hardirqs_off+0x1df/0x2e0 kernel/locking/lockdep.c:3453 Code: 5c 08 00 00 5b 41 5c 41 5d 5d c3 48 c7 c0 58 1d f3 88 48 ba 00 00 00 00 00 fc ff df 48 c1 e8 03 80 3c 10 00 0f 85 d3 00 00 00 <48> 83 3d 21 9e 99 07 00 0f 84 b9 00 00 00 9c 58 0f 1f 44 00 00 f6 RSP: 0018:ffff8880a6f3f1b8 EFLAGS: 00000046 RAX: 1ffffffff11e63ab RBX: ffff88808c9c6080 RCX: 0000000000000000 RDX: dffffc0000000000 RSI: 0000000000000000 RDI: ffff88808c9c6914 RBP: ffff8880a6f3f1d0 R08: ffff88808c9c6080 R09: fffffbfff16be5d1 R10: fffffbfff16be5d0 R11: 0000000000000003 R12: ffffffff8746591f R13: ffff88808c9c6080 R14: ffffffff8746591f R15: 0000000000000003 FS: 00000000011e4880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffff600400 CR3: 00000000a8920000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: trace_hardirqs_off+0x62/0x240 kernel/trace/trace_preemptirq.c:45 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:108 [inline] _raw_spin_lock_irqsave+0x6f/0xcd kernel/locking/spinlock.c:159 __wake_up_common_lock+0xc8/0x150 kernel/sched/wait.c:122 __wake_up+0xe/0x10 kernel/sched/wait.c:142 netlink_unlock_table net/netlink/af_netlink.c:466 [inline] netlink_unlock_table net/netlink/af_netlink.c:463 [inline] netlink_broadcast_filtered+0x705/0xb80 net/netlink/af_netlink.c:1514 netlink_broadcast+0x3a/0x50 net/netlink/af_netlink.c:1534 rtnetlink_send+0xdd/0x110 net/core/rtnetlink.c:714 tcf_add_notify net/sched/act_api.c:1343 [inline] tcf_action_add+0x243/0x370 net/sched/act_api.c:1362 tc_ctl_action+0x3b5/0x4bc net/sched/act_api.c:1410 rtnetlink_rcv_msg+0x463/0xb00 net/core/rtnetlink.c:5386 netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477 rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5404 netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline] netlink_unicast+0x531/0x710 net/netlink/af_netlink.c:1328 netlink_sendmsg+0x8a5/0xd60 net/netlink/af_netlink.c:1917 sock_sendmsg_nosec net/socket.c:637 [inline] sock_sendmsg+0xd7/0x130 net/socket.c:657 ___sys_sendmsg+0x803/0x920 net/socket.c:2311 __sys_sendmsg+0x105/0x1d0 net/socket.c:2356 __do_sys_sendmsg net/socket.c:2365 [inline] __se_sys_sendmsg net/socket.c:2363 [inline] __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2363 do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x440939 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot+cf0adbb9c28c8866c788@syzkaller.appspotmail.com Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-29net_sched: fix backward compatibility for TCA_ACT_KINDCong Wang1-4/+5
[ Upstream commit 4b793feccae3b06764268377a4030eb774ed924e ] For TCA_ACT_KIND, we have to keep the backward compatibility too, and rely on nla_strlcpy() to check and terminate the string with a NUL. Note for TC actions, nla_strcmp() is already used to compare kind strings, so we don't need to fix other places. Fixes: 199ce850ce11 ("net_sched: add policy validation for action attributes") Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-29net_sched: fix backward compatibility for TCA_KINDCong Wang2-5/+34
[ Upstream commit 6f96c3c6904c26cea9ca2726d5d8a9b0b8205b3c ] Marcelo noticed a backward compatibility issue of TCA_KIND after we move from NLA_STRING to NLA_NUL_STRING, so it is probably too late to change it. Instead, to make everyone happy, we can just insert a NUL to terminate the string with nla_strlcpy() like we do for TC actions. Fixes: 62794fc4fbf5 ("net_sched: add max len check for TCA_KIND") Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-07net: sched: taprio: Avoid division by zero on invalid link speedVladimir Oltean1-1/+1
[ Upstream commit 9a9251a3534745d08a92abfeca0ca467b912b5f6 ] The check in taprio_set_picos_per_byte is currently not robust enough and will trigger this division by zero, due to e.g. PHYLINK not setting kset->base.speed when there is no PHY connected: [ 27.109992] Division by zero in kernel. [ 27.113842] CPU: 1 PID: 198 Comm: tc Not tainted 5.3.0-rc5-01246-gc4006b8c2637-dirty #212 [ 27.121974] Hardware name: Freescale LS1021A [ 27.126234] [<c03132e0>] (unwind_backtrace) from [<c030d8b8>] (show_stack+0x10/0x14) [ 27.133938] [<c030d8b8>] (show_stack) from [<c10b21b0>] (dump_stack+0xb0/0xc4) [ 27.141124] [<c10b21b0>] (dump_stack) from [<c10af97c>] (Ldiv0_64+0x8/0x18) [ 27.148052] [<c10af97c>] (Ldiv0_64) from [<c0700260>] (div64_u64+0xcc/0xf0) [ 27.154978] [<c0700260>] (div64_u64) from [<c07002d0>] (div64_s64+0x4c/0x68) [ 27.161993] [<c07002d0>] (div64_s64) from [<c0f3d890>] (taprio_set_picos_per_byte+0xe8/0xf4) [ 27.170388] [<c0f3d890>] (taprio_set_picos_per_byte) from [<c0f3f614>] (taprio_change+0x668/0xcec) [ 27.179302] [<c0f3f614>] (taprio_change) from [<c0f2bc24>] (qdisc_create+0x1fc/0x4f4) [ 27.187091] [<c0f2bc24>] (qdisc_create) from [<c0f2c0c8>] (tc_modify_qdisc+0x1ac/0x6f8) [ 27.195055] [<c0f2c0c8>] (tc_modify_qdisc) from [<c0ee9604>] (rtnetlink_rcv_msg+0x268/0x2dc) [ 27.203449] [<c0ee9604>] (rtnetlink_rcv_msg) from [<c0f4fef0>] (netlink_rcv_skb+0xe0/0x114) [ 27.211756] [<c0f4fef0>] (netlink_rcv_skb) from [<c0f4f6cc>] (netlink_unicast+0x1b4/0x22c) [ 27.219977] [<c0f4f6cc>] (netlink_unicast) from [<c0f4fa84>] (netlink_sendmsg+0x284/0x340) [ 27.228198] [<c0f4fa84>] (netlink_sendmsg) from [<c0eae5fc>] (sock_sendmsg+0x14/0x24) [ 27.235988] [<c0eae5fc>] (sock_sendmsg) from [<c0eaedf8>] (___sys_sendmsg+0x214/0x228) [ 27.243863] [<c0eaedf8>] (___sys_sendmsg) from [<c0eb015c>] (__sys_sendmsg+0x50/0x8c) [ 27.251652] [<c0eb015c>] (__sys_sendmsg) from [<c0301000>] (ret_fast_syscall+0x0/0x54) [ 27.259524] Exception stack(0xe8045fa8 to 0xe8045ff0) [ 27.264546] 5fa0: b6f608c8 000000f8 00000003 bed7e2f0 00000000 00000000 [ 27.272681] 5fc0: b6f608c8 000000f8 004ce54c 00000128 5d3ce8c7 00000000 00000026 00505c9c [ 27.280812] 5fe0: 00000070 bed7e298 004ddd64 b6dd1e64 Russell King points out that the ethtool API says zero is a valid return value of __ethtool_get_link_ksettings: * If it is enabled then they are read-only; if the link * is up they represent the negotiated link mode; if the link is down, * the speed is 0, %SPEED_UNKNOWN or the highest enabled speed and * @duplex is %DUPLEX_UNKNOWN or the best enabled duplex mode. So, it seems that taprio is not following the API... I'd suggest either fixing taprio, or getting agreement to change the ethtool API. The chosen path was to fix taprio. Fixes: 7b9eba7ba0c1 ("net/sched: taprio: fix picos_per_byte miscalculation") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-07net: sched: cbs: Avoid division by zero when calculating the port rateVladimir Oltean1-1/+1
[ Upstream commit 83c8c3cf45163f0c823db37be6ab04dfcf8ac751 ] As explained in the "net: sched: taprio: Avoid division by zero on invalid link speed" commit, it is legal for the ethtool API to return zero as a link speed. So guard against it to ensure we don't perform a division by zero in kernel. Fixes: e0a7683d30e9 ("net/sched: cbs: fix port_rate miscalculation") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-07sch_dsmark: fix potential NULL deref in dsmark_init()Eric Dumazet1-0/+2
[ Upstream commit 474f0813a3002cb299bb73a5a93aa1f537a80ca8 ] Make sure TCA_DSMARK_INDICES was provided by the user. syzbot reported : kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 8799 Comm: syz-executor235 Not tainted 5.3.0+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:nla_get_u16 include/net/netlink.h:1501 [inline] RIP: 0010:dsmark_init net/sched/sch_dsmark.c:364 [inline] RIP: 0010:dsmark_init+0x193/0x640 net/sched/sch_dsmark.c:339 Code: 85 db 58 0f 88 7d 03 00 00 e8 e9 1a ac fb 48 8b 9d 70 ff ff ff 48 b8 00 00 00 00 00 fc ff df 48 8d 7b 04 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 ca RSP: 0018:ffff88809426f3b8 EFLAGS: 00010247 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff85c6eb09 RDX: 0000000000000000 RSI: ffffffff85c6eb17 RDI: 0000000000000004 RBP: ffff88809426f4b0 R08: ffff88808c4085c0 R09: ffffed1015d26159 R10: ffffed1015d26158 R11: ffff8880ae930ac7 R12: ffff8880a7e96940 R13: dffffc0000000000 R14: ffff88809426f8c0 R15: 0000000000000000 FS: 0000000001292880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000080 CR3: 000000008ca1b000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: qdisc_create+0x4ee/0x1210 net/sched/sch_api.c:1237 tc_modify_qdisc+0x524/0x1c50 net/sched/sch_api.c:1653 rtnetlink_rcv_msg+0x463/0xb00 net/core/rtnetlink.c:5223 netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477 rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5241 netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline] netlink_unicast+0x531/0x710 net/netlink/af_netlink.c:1328 netlink_sendmsg+0x8a5/0xd60 net/netlink/af_netlink.c:1917 sock_sendmsg_nosec net/socket.c:637 [inline] sock_sendmsg+0xd7/0x130 net/socket.c:657 ___sys_sendmsg+0x803/0x920 net/socket.c:2311 __sys_sendmsg+0x105/0x1d0 net/socket.c:2356 __do_sys_sendmsg net/socket.c:2365 [inline] __se_sys_sendmsg net/socket.c:2363 [inline] __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2363 do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x440369 Fixes: 758cc43c6d73 ("[PKT_SCHED]: Fix dsmark to apply changes consistent") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-07sch_cbq: validate TCA_CBQ_WRROPT to avoid crashEric Dumazet1-14/+29
[ Upstream commit e9789c7cc182484fc031fd88097eb14cb26c4596 ] syzbot reported a crash in cbq_normalize_quanta() caused by an out of range cl->priority. iproute2 enforces this check, but malicious users do not. kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN PTI Modules linked in: CPU: 1 PID: 26447 Comm: syz-executor.1 Not tainted 5.3+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:cbq_normalize_quanta.part.0+0x1fd/0x430 net/sched/sch_cbq.c:902 RSP: 0018:ffff8801a5c333b0 EFLAGS: 00010206 RAX: 0000000020000003 RBX: 00000000fffffff8 RCX: ffffc9000712f000 RDX: 00000000000043bf RSI: ffffffff83be8962 RDI: 0000000100000018 RBP: ffff8801a5c33420 R08: 000000000000003a R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 00000000000002ef R13: ffff88018da95188 R14: dffffc0000000000 R15: 0000000000000015 FS: 00007f37d26b1700(0000) GS:ffff8801dad00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000004c7cec CR3: 00000001bcd0a006 CR4: 00000000001626f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: [<ffffffff83be9d57>] cbq_normalize_quanta include/net/pkt_sched.h:27 [inline] [<ffffffff83be9d57>] cbq_addprio net/sched/sch_cbq.c:1097 [inline] [<ffffffff83be9d57>] cbq_set_wrr+0x2d7/0x450 net/sched/sch_cbq.c:1115 [<ffffffff83bee8a7>] cbq_change_class+0x987/0x225b net/sched/sch_cbq.c:1537 [<ffffffff83b96985>] tc_ctl_tclass+0x555/0xcd0 net/sched/sch_api.c:2329 [<ffffffff83a84655>] rtnetlink_rcv_msg+0x485/0xc10 net/core/rtnetlink.c:5248 [<ffffffff83cadf0a>] netlink_rcv_skb+0x17a/0x460 net/netlink/af_netlink.c:2510 [<ffffffff83a7db6d>] rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5266 [<ffffffff83cac2c6>] netlink_unicast_kernel net/netlink/af_netlink.c:1324 [inline] [<ffffffff83cac2c6>] netlink_unicast+0x536/0x720 net/netlink/af_netlink.c:1350 [<ffffffff83cacd4a>] netlink_sendmsg+0x89a/0xd50 net/netlink/af_netlink.c:1939 [<ffffffff8399d46e>] sock_sendmsg_nosec net/socket.c:673 [inline] [<ffffffff8399d46e>] sock_sendmsg+0x12e/0x170 net/socket.c:684 [<ffffffff8399f1fd>] ___sys_sendmsg+0x81d/0x960 net/socket.c:2359 [<ffffffff839a2d05>] __sys_sendmsg+0x105/0x1d0 net/socket.c:2397 [<ffffffff839a2df9>] SYSC_sendmsg net/socket.c:2406 [inline] [<ffffffff839a2df9>] SyS_sendmsg+0x29/0x30 net/socket.c:2404 [<ffffffff8101ccc8>] do_syscall_64+0x528/0x770 arch/x86/entry/common.c:305 [<ffffffff84400091>] entry_SYSCALL_64_after_hwframe+0x42/0xb7 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-07net: sched: taprio: Fix potential integer overflow in taprio_set_picos_per_byteVladimir Oltean1-2/+1
[ Upstream commit 68ce6688a5baefde30914fc07fc27292dbbe8320 ] The speed divisor is used in a context expecting an s64, but it is evaluated using 32-bit arithmetic. To avoid that happening, instead of multiplying by 1,000,000 in the first place, simplify the fraction and do a standard 32 bit division instead. Fixes: f04b514c0ce2 ("taprio: Set default link speed to 10 Mbps in taprio_set_picos_per_byte") Reported-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-05net: sched: fix possible crash in tcf_action_destroy()Eric Dumazet1-2/+4
[ Upstream commit 3d66b89c30f9220a72e92847768fc8ba4d027d88 ] If the allocation done in tcf_exts_init() failed, we end up with a NULL pointer in exts->actions. kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 8198 Comm: syz-executor.3 Not tainted 5.3.0-rc8+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:tcf_action_destroy+0x71/0x160 net/sched/act_api.c:705 Code: c3 08 44 89 ee e8 4f cb bb fb 41 83 fd 20 0f 84 c9 00 00 00 e8 c0 c9 bb fb 48 89 d8 48 b9 00 00 00 00 00 fc ff df 48 c1 e8 03 <80> 3c 08 00 0f 85 c0 00 00 00 4c 8b 33 4d 85 f6 0f 84 9d 00 00 00 RSP: 0018:ffff888096e16ff0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: dffffc0000000000 RDX: 0000000000040000 RSI: ffffffff85b6ab30 RDI: 0000000000000000 RBP: ffff888096e17020 R08: ffff8880993f6140 R09: fffffbfff11cae67 R10: fffffbfff11cae66 R11: ffffffff88e57333 R12: 0000000000000000 R13: 0000000000000000 R14: ffff888096e177a0 R15: 0000000000000001 FS: 00007f62bc84a700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000758040 CR3: 0000000088b64000 CR4: 00000000001426e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: tcf_exts_destroy+0x38/0xb0 net/sched/cls_api.c:3030 tcindex_set_parms+0xf7f/0x1e50 net/sched/cls_tcindex.c:488 tcindex_change+0x230/0x318 net/sched/cls_tcindex.c:519 tc_new_tfilter+0xa4b/0x1c70 net/sched/cls_api.c:2152 rtnetlink_rcv_msg+0x838/0xb00 net/core/rtnetlink.c:5214 netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477 rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5241 netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline] netlink_unicast+0x531/0x710 net/netlink/af_netlink.c:1328 netlink_sendmsg+0x8a5/0xd60 net/netlink/af_netlink.c:1917 sock_sendmsg_nosec net/socket.c:637 [inline] sock_sendmsg+0xd7/0x130 net/socket.c:657 ___sys_sendmsg+0x3e2/0x920 net/socket.c:2311 __sys_sendmmsg+0x1bf/0x4d0 net/socket.c:2413 __do_sys_sendmmsg net/socket.c:2442 [inline] Fixes: 90b73b77d08e ("net: sched: change action API to use array of pointers to actions") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Vlad Buslov <vladbu@mellanox.com> Cc: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-05net_sched: add policy validation for action attributesCong Wang1-16/+18
[ Upstream commit 199ce850ce112315cfc68d42b694bcaa27b097b7 ] Similar to commit 8b4c3cdd9dd8 ("net: sched: Add policy validation for tc attributes"), we need to add proper policy validation for TC action attributes too. Cc: David Ahern <dsahern@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-05net/sched: cbs: Fix not adding cbs instance to listVinicius Costa Gomes1-17/+13
[ Upstream commit 3e8b9bfa110896f95d602d8c98d5f9d67e41d78c ] When removing a cbs instance when offloading is enabled, the crash below can be observed. The problem happens because that when offloading is enabled, the cbs instance is not added to the list. Also, the current code doesn't handle correctly the case when offload is disabled without removing the qdisc: if the link speed changes the credit calculations will be wrong. When we create the cbs instance with offloading enabled, it's not added to the notification list, when later we disable offloading, it's not in the list, so link speed changes will not affect it. The solution for both issues is the same, add the cbs instance being created unconditionally to the global list, even if the link state notification isn't useful "right now". Crash log: [518758.189866] BUG: kernel NULL pointer dereference, address: 0000000000000000 [518758.189870] #PF: supervisor read access in kernel mode [518758.189871] #PF: error_code(0x0000) - not-present page [518758.189872] PGD 0 P4D 0 [518758.189874] Oops: 0000 [#1] SMP PTI [518758.189876] CPU: 3 PID: 4825 Comm: tc Not tainted 5.2.9 #1 [518758.189877] Hardware name: Gigabyte Technology Co., Ltd. Z390 AORUS ULTRA/Z390 AORUS ULTRA-CF, BIOS F7 03/14/2019 [518758.189881] RIP: 0010:__list_del_entry_valid+0x29/0xa0 [518758.189883] Code: 90 48 b8 00 01 00 00 00 00 ad de 55 48 8b 17 4c 8b 47 08 48 89 e5 48 39 c2 74 27 48 b8 00 02 00 00 00 00 ad de 49 39 c0 74 2d <49> 8b 30 48 39 fe 75 3d 48 8b 52 08 48 39 f2 75 4c b8 01 00 00 00 [518758.189885] RSP: 0018:ffffa27e43903990 EFLAGS: 00010207 [518758.189887] RAX: dead000000000200 RBX: ffff8bce69f0f000 RCX: 0000000000000000 [518758.189888] RDX: 0000000000000000 RSI: ffff8bce69f0f064 RDI: ffff8bce69f0f1e0 [518758.189890] RBP: ffffa27e43903990 R08: 0000000000000000 R09: ffff8bce69e788c0 [518758.189891] R10: ffff8bce62acd400 R11: 00000000000003cb R12: ffff8bce69e78000 [518758.189892] R13: ffff8bce69f0f140 R14: 0000000000000000 R15: 0000000000000000 [518758.189894] FS: 00007fa1572c8f80(0000) GS:ffff8bce6e0c0000(0000) knlGS:0000000000000000 [518758.189895] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [518758.189896] CR2: 0000000000000000 CR3: 000000040a398006 CR4: 00000000003606e0 [518758.189898] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [518758.189899] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [518758.189900] Call Trace: [518758.189904] cbs_destroy+0x32/0xa0 [sch_cbs] [518758.189906] qdisc_destroy+0x45/0x120 [518758.189907] qdisc_put+0x25/0x30 [518758.189908] qdisc_graft+0x2c1/0x450 [518758.189910] tc_get_qdisc+0x1c8/0x310 [518758.189912] ? get_page_from_freelist+0x91a/0xcb0 [518758.189914] rtnetlink_rcv_msg+0x293/0x360 [518758.189916] ? kmem_cache_alloc_node_trace+0x178/0x260 [518758.189918] ? __kmalloc_node_track_caller+0x38/0x50 [518758.189920] ? rtnl_calcit.isra.0+0xf0/0xf0 [518758.189922] netlink_rcv_skb+0x48/0x110 [518758.189923] rtnetlink_rcv+0x10/0x20 [518758.189925] netlink_unicast+0x15b/0x1d0 [518758.189926] netlink_sendmsg+0x1ea/0x380 [518758.189929] sock_sendmsg+0x2f/0x40 [518758.189930] ___sys_sendmsg+0x295/0x2f0 [518758.189932] ? ___sys_recvmsg+0x151/0x1e0 [518758.189933] ? do_wp_page+0x7e/0x450 [518758.189935] __sys_sendmsg+0x48/0x80 [518758.189937] __x64_sys_sendmsg+0x1a/0x20 [518758.189939] do_syscall_64+0x53/0x1f0 [518758.189941] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [518758.189942] RIP: 0033:0x7fa15755169a [518758.189944] Code: 48 c7 c0 ff ff ff ff eb be 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 18 b8 2e 00 00 00 c5 fc 77 0f 05 <48> 3d 00 f0 ff ff 77 5e c3 0f 1f 44 00 00 48 83 ec 28 89 54 24 1c [518758.189946] RSP: 002b:00007ffda58b60b8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [518758.189948] RAX: ffffffffffffffda RBX: 000055e4b836d9a0 RCX: 00007fa15755169a [518758.189949] RDX: 0000000000000000 RSI: 00007ffda58b6128 RDI: 0000000000000003 [518758.189951] RBP: 00007ffda58b6190 R08: 0000000000000001 R09: 000055e4b9d848a0 [518758.189952] R10: 0000000000000000 R11: 0000000000000246 R12: 000000005d654b49 [518758.189953] R13: 0000000000000000 R14: 00007ffda58b6230 R15: 00007ffda58b6210 [518758.189955] Modules linked in: sch_cbs sch_etf sch_mqprio netlink_diag unix_diag e1000e igb intel_pch_thermal thermal video backlight pcc_cpufreq [518758.189960] CR2: 0000000000000000 [518758.189961] ---[ end trace 6a13f7aaf5376019 ]--- [518758.189963] RIP: 0010:__list_del_entry_valid+0x29/0xa0 [518758.189964] Code: 90 48 b8 00 01 00 00 00 00 ad de 55 48 8b 17 4c 8b 47 08 48 89 e5 48 39 c2 74 27 48 b8 00 02 00 00 00 00 ad de 49 39 c0 74 2d <49> 8b 30 48 39 fe 75 3d 48 8b 52 08 48 39 f2 75 4c b8 01 00 00 00 [518758.189967] RSP: 0018:ffffa27e43903990 EFLAGS: 00010207 [518758.189968] RAX: dead000000000200 RBX: ffff8bce69f0f000 RCX: 0000000000000000 [518758.189969] RDX: 0000000000000000 RSI: ffff8bce69f0f064 RDI: ffff8bce69f0f1e0 [518758.189971] RBP: ffffa27e43903990 R08: 0000000000000000 R09: ffff8bce69e788c0 [518758.189972] R10: ffff8bce62acd400 R11: 00000000000003cb R12: ffff8bce69e78000 [518758.189973] R13: ffff8bce69f0f140 R14: 0000000000000000 R15: 0000000000000000 [518758.189975] FS: 00007fa1572c8f80(0000) GS:ffff8bce6e0c0000(0000) knlGS:0000000000000000 [518758.189976] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [518758.189977] CR2: 0000000000000000 CR3: 000000040a398006 CR4: 00000000003606e0 [518758.189979] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [518758.189980] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Fixes: e0a7683d30e9 ("net/sched: cbs: fix port_rate miscalculation") Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-05sch_netem: fix a divide by zero in tabledist()Eric Dumazet1-1/+1
[ Upstream commit b41d936b5ecfdb3a4abc525ce6402a6c49cffddc ] syzbot managed to crash the kernel in tabledist() loading an empty distribution table. t = dist->table[rnd % dist->size]; Simply return an error when such load is attempted. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-05net_sched: add max len check for TCA_KINDCong Wang1-1/+2
[ Upstream commit 62794fc4fbf52f2209dc094ea255eaef760e7d01 ] The TCA_KIND attribute is of NLA_STRING which does not check the NUL char. KMSAN reported an uninit-value of TCA_KIND which is likely caused by the lack of NUL. Change it to NLA_NUL_STRING and add a max len too. Fixes: 8b4c3cdd9dd8 ("net: sched: Add policy validation for tc attributes") Reported-and-tested-by: syzbot+618aacd49e8c8b8486bd@syzkaller.appspotmail.com Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-05net/sched: act_sample: don't push mac header on ip6gre ingressDavide Caratti1-0/+1
[ Upstream commit 92974a1d006ad8b30d53047c70974c9e065eb7df ] current 'sample' action doesn't push the mac header of ingress packets if they are received by a layer 3 tunnel (like gre or sit); but it forgot to check for gre over ipv6, so the following script: # tc q a dev $d clsact # tc f a dev $d ingress protocol ip flower ip_proto icmp action sample \ > group 100 rate 1 # psample -v -g 100 dumps everything, including outer header and mac, when $d is a gre tunnel over ipv6. Fix this adding a missing label for ARPHRD_IP6GRE devices. Fixes: 5c5670fae430 ("net/sched: Introduce sample tc action") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Yotam Gigi <yotam.gi@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-21net_sched: let qdisc_put() accept NULL pointerCong Wang1-0/+3
[ Upstream commit 6efb971ba8edfbd80b666f29de12882852f095ae ] When tcf_block_get() fails in sfb_init(), q->qdisc is still a NULL pointer which leads to a crash in sfb_destroy(). Similar for sch_dsmark. Instead of fixing each separately, Linus suggested to just accept NULL pointer in qdisc_put(), which would make callers easier. (For sch_dsmark, the bug probably exists long before commit 6529eaba33f0.) Fixes: 6529eaba33f0 ("net: sched: introduce tcf block infractructure") Reported-by: syzbot+d5870a903591faaca4ae@syzkaller.appspotmail.com Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-10sch_hhf: ensure quantum and hhf_non_hh_weight are non-zeroCong Wang1-1/+1
In case of TCA_HHF_NON_HH_WEIGHT or TCA_HHF_QUANTUM is zero, it would make no progress inside the loop in hhf_dequeue() thus kernel would get stuck. Fix this by checking this corner case in hhf_change(). Fixes: 10239edf86f1 ("net-qdisc-hhf: Heavy-Hitter Filter (HHF) qdisc") Reported-by: syzbot+bc6297c11f19ee807dc2@syzkaller.appspotmail.com Reported-by: syzbot+041483004a7f45f1f20a@syzkaller.appspotmail.com Reported-by: syzbot+55be5f513bed37fc4367@syzkaller.appspotmail.com Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Cc: Terry Lam <vtlam@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10net_sched: check cops->tcf_block in tc_bind_tclass()Cong Wang1-0/+2
At least sch_red and sch_tbf don't implement ->tcf_block() while still have a non-zero tc "class". Instead of adding nop implementations to each of such qdisc's, we can just relax the check of cops->tcf_block() in tc_bind_tclass(). They don't support TC filter anyway. Reported-by: syzbot+21b29db13c065852f64b@syzkaller.appspotmail.com Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-06net: sched: fix reordering issuesEric Dumazet1-2/+7
Whenever MQ is not used on a multiqueue device, we experience serious reordering problems. Bisection found the cited commit. The issue can be described this way : - A single qdisc hierarchy is shared by all transmit queues. (eg : tc qdisc replace dev eth0 root fq_codel) - When/if try_bulk_dequeue_skb_slow() dequeues a packet targetting a different transmit queue than the one used to build a packet train, we stop building the current list and save the 'bad' skb (P1) in a special queue. (bad_txq) - When dequeue_skb() calls qdisc_dequeue_skb_bad_txq() and finds this skb (P1), it checks if the associated transmit queues is still in frozen state. If the queue is still blocked (by BQL or NIC tx ring full), we leave the skb in bad_txq and return NULL. - dequeue_skb() calls q->dequeue() to get another packet (P2) The other packet can target the problematic queue (that we found in frozen state for the bad_txq packet), but another cpu just ran TX completion and made room in the txq that is now ready to accept new packets. - Packet P2 is sent while P1 is still held in bad_txq, P1 might be sent at next round. In practice P2 is the lead of a big packet train (P2,P3,P4 ...) filling the BQL budget and delaying P1 by many packets :/ To solve this problem, we have to block the dequeue process as long as the first packet in bad_txq can not be sent. Reordering issues disappear and no side effects have been seen. Fixes: a53851e2c321 ("net: sched: explicit locking in gso_cpu fallback") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: John Fastabend <john.fastabend@gmail.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-01net/sched: cbs: Set default link speed to 10 Mbps in cbs_set_port_rateVladimir Oltean1-8/+11
The discussion to be made is absolutely the same as in the case of previous patch ("taprio: Set default link speed to 10 Mbps in taprio_set_picos_per_byte"). Nothing is lost when setting a default. Cc: Leandro Dorileo <leandro.maciel.dorileo@intel.com> Fixes: e0a7683d30e9 ("net/sched: cbs: fix port_rate miscalculation") Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-01taprio: Set default link speed to 10 Mbps in taprio_set_picos_per_byteVladimir Oltean1-10/+13
The taprio budget needs to be adapted at runtime according to interface link speed. But that handling is problematic. For one thing, installing a qdisc on an interface that doesn't have carrier is not illegal. But taprio prints the following stack trace: [ 31.851373] ------------[ cut here ]------------ [ 31.856024] WARNING: CPU: 1 PID: 207 at net/sched/sch_taprio.c:481 taprio_dequeue+0x1a8/0x2d4 [ 31.864566] taprio: dequeue() called with unknown picos per byte. [ 31.864570] Modules linked in: [ 31.873701] CPU: 1 PID: 207 Comm: tc Not tainted 5.3.0-rc5-01199-g8838fe023cd6 #1689 [ 31.881398] Hardware name: Freescale LS1021A [ 31.885661] [<c03133a4>] (unwind_backtrace) from [<c030d8cc>] (show_stack+0x10/0x14) [ 31.893368] [<c030d8cc>] (show_stack) from [<c10ac958>] (dump_stack+0xb4/0xc8) [ 31.900555] [<c10ac958>] (dump_stack) from [<c0349d04>] (__warn+0xe0/0xf8) [ 31.907395] [<c0349d04>] (__warn) from [<c0349d64>] (warn_slowpath_fmt+0x48/0x6c) [ 31.914841] [<c0349d64>] (warn_slowpath_fmt) from [<c0f38db4>] (taprio_dequeue+0x1a8/0x2d4) [ 31.923150] [<c0f38db4>] (taprio_dequeue) from [<c0f227b0>] (__qdisc_run+0x90/0x61c) [ 31.930856] [<c0f227b0>] (__qdisc_run) from [<c0ec82ac>] (net_tx_action+0x12c/0x2bc) [ 31.938560] [<c0ec82ac>] (net_tx_action) from [<c0302298>] (__do_softirq+0x130/0x3c8) [ 31.946350] [<c0302298>] (__do_softirq) from [<c03502a0>] (irq_exit+0xbc/0xd8) [ 31.953536] [<c03502a0>] (irq_exit) from [<c03a4808>] (__handle_domain_irq+0x60/0xb4) [ 31.961328] [<c03a4808>] (__handle_domain_irq) from [<c0754478>] (gic_handle_irq+0x58/0x9c) [ 31.969638] [<c0754478>] (gic_handle_irq) from [<c0301a8c>] (__irq_svc+0x6c/0x90) [ 31.977076] Exception stack(0xe8167b20 to 0xe8167b68) [ 31.982100] 7b20: e9d4bd80 00000cc0 000000cf 00000000 e9d4bd80 c1f38958 00000cc0 c1f38960 [ 31.990234] 7b40: 00000001 000000cf 00000004 e9dc0800 00000000 e8167b70 c0f478ec c0f46d94 [ 31.998363] 7b60: 60070013 ffffffff [ 32.001833] [<c0301a8c>] (__irq_svc) from [<c0f46d94>] (netlink_trim+0x18/0xd8) [ 32.009104] [<c0f46d94>] (netlink_trim) from [<c0f478ec>] (netlink_broadcast_filtered+0x34/0x414) [ 32.017930] [<c0f478ec>] (netlink_broadcast_filtered) from [<c0f47cec>] (netlink_broadcast+0x20/0x28) [ 32.027102] [<c0f47cec>] (netlink_broadcast) from [<c0eea378>] (rtnetlink_send+0x34/0x88) [ 32.035238] [<c0eea378>] (rtnetlink_send) from [<c0f25890>] (notify_and_destroy+0x2c/0x44) [ 32.043461] [<c0f25890>] (notify_and_destroy) from [<c0f25e08>] (qdisc_graft+0x398/0x470) [ 32.051595] [<c0f25e08>] (qdisc_graft) from [<c0f27a00>] (tc_modify_qdisc+0x3a4/0x724) [ 32.059470] [<c0f27a00>] (tc_modify_qdisc) from [<c0ee4c84>] (rtnetlink_rcv_msg+0x260/0x2ec) [ 32.067864] [<c0ee4c84>] (rtnetlink_rcv_msg) from [<c0f4a988>] (netlink_rcv_skb+0xb8/0x110) [ 32.076172] [<c0f4a988>] (netlink_rcv_skb) from [<c0f4a170>] (netlink_unicast+0x1b4/0x22c) [ 32.084392] [<c0f4a170>] (netlink_unicast) from [<c0f4a5e4>] (netlink_sendmsg+0x33c/0x380) [ 32.092614] [<c0f4a5e4>] (netlink_sendmsg) from [<c0ea9f40>] (sock_sendmsg+0x14/0x24) [ 32.100403] [<c0ea9f40>] (sock_sendmsg) from [<c0eaa780>] (___sys_sendmsg+0x214/0x228) [ 32.108279] [<c0eaa780>] (___sys_sendmsg) from [<c0eabad0>] (__sys_sendmsg+0x50/0x8c) [ 32.116068] [<c0eabad0>] (__sys_sendmsg) from [<c0301000>] (ret_fast_syscall+0x0/0x54) [ 32.123938] Exception stack(0xe8167fa8 to 0xe8167ff0) [ 32.128960] 7fa0: b6fa68c8 000000f8 00000003 bea142d0 00000000 00000000 [ 32.137093] 7fc0: b6fa68c8 000000f8 0052154c 00000128 5d6468a2 00000000 00000028 00558c9c [ 32.145224] 7fe0: 00000070 bea14278 00530d64 b6e17e64 [ 32.150659] ---[ end trace 2139c9827c3e5177 ]--- This happens because the qdisc ->dequeue callback gets called. Which again is not illegal, the qdisc will dequeue even when the interface is up but doesn't have carrier (and hence SPEED_UNKNOWN), and the frames will be dropped further down the stack in dev_direct_xmit(). And, at the end of the day, for what? For calculating the initial budget of an interface which is non-operational at the moment and where frames will get dropped anyway. So if we can't figure out the link speed, default to SPEED_10 and move along. We can also remove the runtime check now. Cc: Leandro Dorileo <leandro.maciel.dorileo@intel.com> Fixes: 7b9eba7ba0c1 ("net/sched: taprio: fix picos_per_byte miscalculation") Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-01taprio: Fix kernel panic in taprio_destroyVladimir Oltean1-4/+4
taprio_init may fail earlier than this line: list_add(&q->taprio_list, &taprio_list); i.e. due to the net device not being multi queue. Attempting to remove q from the global taprio_list when it is not part of it will result in a kernel panic. Fix it by matching list_add and list_del better to one another in the order of operations. This way we can keep the deletion unconditional and with lower complexity - O(1). Cc: Leandro Dorileo <leandro.maciel.dorileo@intel.com> Fixes: 7b9eba7ba0c1 ("net/sched: taprio: fix picos_per_byte miscalculation") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-29net/sched: pfifo_fast: fix wrong dereference in pfifo_fast_enqueueDavide Caratti1-2/+6
Now that 'TCQ_F_CPUSTATS' bit can be cleared, depending on the value of 'TCQ_F_NOLOCK' bit in the parent qdisc, we can't assume anymore that per-cpu counters are there in the error path of skb_array_produce(). Otherwise, the following splat can be seen: Unable to handle kernel paging request at virtual address 0000600dea430008 Mem abort info: ESR = 0x96000005 Exception class = DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000005 CM = 0, WnR = 0 user pgtable: 64k pages, 48-bit VAs, pgdp = 000000007b97530e [0000600dea430008] pgd=0000000000000000, pud=0000000000000000 Internal error: Oops: 96000005 [#1] SMP [...] pstate: 10000005 (nzcV daif -PAN -UAO) pc : pfifo_fast_enqueue+0x524/0x6e8 lr : pfifo_fast_enqueue+0x46c/0x6e8 sp : ffff800d39376fe0 x29: ffff800d39376fe0 x28: 1ffff001a07d1e40 x27: ffff800d03e8f188 x26: ffff800d03e8f200 x25: 0000000000000062 x24: ffff800d393772f0 x23: 0000000000000000 x22: 0000000000000403 x21: ffff800cca569a00 x20: ffff800d03e8ee00 x19: ffff800cca569a10 x18: 00000000000000bf x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 x14: ffff1001a726edd0 x13: 1fffe4000276a9a4 x12: 0000000000000000 x11: dfff200000000000 x10: ffff800d03e8f1a0 x9 : 0000000000000003 x8 : 0000000000000000 x7 : 00000000f1f1f1f1 x6 : ffff1001a726edea x5 : ffff800cca56a53c x4 : 1ffff001bf9a8003 x3 : 1ffff001bf9a8003 x2 : 1ffff001a07d1dcb x1 : 0000600dea430000 x0 : 0000600dea430008 Process ping (pid: 6067, stack limit = 0x00000000dc0aa557) Call trace: pfifo_fast_enqueue+0x524/0x6e8 htb_enqueue+0x660/0x10e0 [sch_htb] __dev_queue_xmit+0x123c/0x2de0 dev_queue_xmit+0x24/0x30 ip_finish_output2+0xc48/0x1720 ip_finish_output+0x548/0x9d8 ip_output+0x334/0x788 ip_local_out+0x90/0x138 ip_send_skb+0x44/0x1d0 ip_push_pending_frames+0x5c/0x78 raw_sendmsg+0xed8/0x28d0 inet_sendmsg+0xc4/0x5c0 sock_sendmsg+0xac/0x108 __sys_sendto+0x1ac/0x2a0 __arm64_sys_sendto+0xc4/0x138 el0_svc_handler+0x13c/0x298 el0_svc+0x8/0xc Code: f9402e80 d538d081 91002000 8b010000 (885f7c03) Fix this by testing the value of 'TCQ_F_CPUSTATS' bit in 'qdisc->flags', before dereferencing 'qdisc->cpu_qstats'. Fixes: 8a53e616de29 ("net: sched: when clearing NOLOCK, clear TCQ_F_CPUSTATS, too") CC: Paolo Abeni <pabeni@redhat.com> CC: Stefano Brivio <sbrivio@redhat.com> Reported-by: Li Shuang <shuali@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-29net: sched: act_sample: fix psample group handling on overwriteVlad Buslov1-1/+5
Action sample doesn't properly handle psample_group pointer in overwrite case. Following issues need to be fixed: - In tcf_sample_init() function RCU_INIT_POINTER() is used to set s->psample_group, even though we neither setting the pointer to NULL, nor preventing concurrent readers from accessing the pointer in some way. Use rcu_swap_protected() instead to safely reset the pointer. - Old value of s->psample_group is not released or deallocated in any way, which results resource leak. Use psample_group_put() on non-NULL value obtained with rcu_swap_protected(). - The function psample_group_put() that released reference to struct psample_group pointed by rcu-pointer s->psample_group doesn't respect rcu grace period when deallocating it. Extend struct psample_group with rcu head and use kfree_rcu when freeing it. Fixes: 5c5670fae430 ("net/sched: Introduce sample tc action") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-29net/sched: pfifo_fast: fix wrong dereference when qdisc is resetDavide Caratti1-4/+7
Now that 'TCQ_F_CPUSTATS' bit can be cleared, depending on the value of 'TCQ_F_NOLOCK' bit in the parent qdisc, we need to be sure that per-cpu counters are present when 'reset()' is called for pfifo_fast qdiscs. Otherwise, the following script: # tc q a dev lo handle 1: root htb default 100 # tc c a dev lo parent 1: classid 1:100 htb \ > rate 95Mbit ceil 100Mbit burst 64k [...] # tc f a dev lo parent 1: protocol arp basic classid 1:100 [...] # tc q a dev lo parent 1:100 handle 100: pfifo_fast [...] # tc q d dev lo root can generate the following splat: Unable to handle kernel paging request at virtual address dfff2c01bd148000 Mem abort info: ESR = 0x96000004 Exception class = DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000004 CM = 0, WnR = 0 [dfff2c01bd148000] address between user and kernel address ranges Internal error: Oops: 96000004 [#1] SMP [...] pstate: 80000005 (Nzcv daif -PAN -UAO) pc : pfifo_fast_reset+0x280/0x4d8 lr : pfifo_fast_reset+0x21c/0x4d8 sp : ffff800d09676fa0 x29: ffff800d09676fa0 x28: ffff200012ee22e4 x27: dfff200000000000 x26: 0000000000000000 x25: ffff800ca0799958 x24: ffff1001940f332b x23: 0000000000000007 x22: ffff200012ee1ab8 x21: 0000600de8a40000 x20: 0000000000000000 x19: ffff800ca0799900 x18: 0000000000000000 x17: 0000000000000002 x16: 0000000000000000 x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000000 x12: ffff1001b922e6e2 x11: 1ffff001b922e6e1 x10: 0000000000000000 x9 : 1ffff001b922e6e1 x8 : dfff200000000000 x7 : 0000000000000000 x6 : 0000000000000000 x5 : 1fffe400025dc45c x4 : 1fffe400025dc357 x3 : 00000c01bd148000 x2 : 0000600de8a40000 x1 : 0000000000000007 x0 : 0000600de8a40004 Call trace: pfifo_fast_reset+0x280/0x4d8 qdisc_reset+0x6c/0x370 htb_reset+0x150/0x3b8 [sch_htb] qdisc_reset+0x6c/0x370 dev_deactivate_queue.constprop.5+0xe0/0x1a8 dev_deactivate_many+0xd8/0x908 dev_deactivate+0xe4/0x190 qdisc_graft+0x88c/0xbd0 tc_get_qdisc+0x418/0x8a8 rtnetlink_rcv_msg+0x3a8/0xa78 netlink_rcv_skb+0x18c/0x328 rtnetlink_rcv+0x28/0x38 netlink_unicast+0x3c4/0x538 netlink_sendmsg+0x538/0x9a0 sock_sendmsg+0xac/0xf8 ___sys_sendmsg+0x53c/0x658 __sys_sendmsg+0xc8/0x140 __arm64_sys_sendmsg+0x74/0xa8 el0_svc_handler+0x164/0x468 el0_svc+0x10/0x14 Code: 910012a0 92400801 d343fc03 11000c21 (38fb6863) Fix this by testing the value of 'TCQ_F_CPUSTATS' bit in 'qdisc->flags', before dereferencing 'qdisc->cpu_qstats'. Changes since v1: - coding style improvements, thanks to Stefano Brivio Fixes: 8a53e616de29 ("net: sched: when clearing NOLOCK, clear TCQ_F_CPUSTATS, too") CC: Paolo Abeni <pabeni@redhat.com> Reported-by: Li Shuang <shuali@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-28net_sched: fix a NULL pointer deref in ipt actionCong Wang19-23/+24
The net pointer in struct xt_tgdtor_param is not explicitly initialized therefore is still NULL when dereferencing it. So we have to find a way to pass the correct net pointer to ipt_destroy_target(). The best way I find is just saving the net pointer inside the per netns struct tcf_idrinfo, which could make this patch smaller. Fixes: 0c66dc1ea3f0 ("netfilter: conntrack: register hooks in netns when needed by ruleset") Reported-and-tested-by: itugrok@yahoo.com Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-09net sched: update skbedit action for batched events operationsRoman Mashak1-0/+12
Add get_fill_size() routine used to calculate the action size when building a batch of events. Fixes: ca9b0e27e ("pkt_action: add new action skbedit") Signed-off-by: Roman Mashak <mrv@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-09net: sched: sch_taprio: fix memleak in error path for sched list parseIvan Khoronzhuk1-1/+2
In error case, all entries should be freed from the sched list before deleting it. For simplicity use rcu way. Fixes: 5a781ccbd19e46 ("tc: Add support for configuring the taprio scheduler") Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-07net sched: update vlan action for batched events operationsRoman Mashak1-0/+9
Add get_fill_size() routine used to calculate the action size when building a batch of events. Fixes: c7e2b9689 ("sched: introduce vlan action") Signed-off-by: Roman Mashak <mrv@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-05net: sched: use temporary variable for actions indexesDmytro Linkin18-75/+100
Currently init call of all actions (except ipt) init their 'parm' structure as a direct pointer to nla data in skb. This leads to race condition when some of the filter actions were initialized successfully (and were assigned with idr action index that was written directly into nla data), but then were deleted and retried (due to following action module missing or classifier-initiated retry), in which case action init code tries to insert action to idr with index that was assigned on previous iteration. During retry the index can be reused by another action that was inserted concurrently, which causes unintended action sharing between filters. To fix described race condition, save action idr index to temporary stack-allocated variable instead on nla data. Fixes: 0190c1d452a9 ("net: sched: atomically check-allocate action") Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com> Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-29net: sched: Fix a possible null-pointer dereference in dequeue_func()Jia-Ju Bai1-3/+3
In dequeue_func(), there is an if statement on line 74 to check whether skb is NULL: if (skb) When skb is NULL, it is used on line 77: prefetch(&skb->end); Thus, a possible null-pointer dereference may occur. To fix this bug, skb->end is used when skb is not NULL. This bug is found by a static analysis tool STCheck written by us. Fixes: 76e3cc126bb2 ("codel: Controlled Delay AQM") Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-25ife: error out when nla attributes are emptyCong Wang1-0/+5
act_ife at least requires TCA_IFE_PARMS, so we have to bail out when there is no attribute passed in. Reported-by: syzbot+fbb5b288c9cb6a2eeac4@syzkaller.appspotmail.com Fixes: ef6980b6becb ("introduce IFE action") Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-21net: sched: verify that q!=NULL before setting q->flagsVlad Buslov1-1/+3
In function int tc_new_tfilter() q pointer can be NULL when adding filter on a shared block. With recent change that resets TCQ_F_CAN_BYPASS after filter creation, following NULL pointer dereference happens in case parent block is shared: [ 212.925060] BUG: kernel NULL pointer dereference, address: 0000000000000010 [ 212.925445] #PF: supervisor write access in kernel mode [ 212.925709] #PF: error_code(0x0002) - not-present page [ 212.925965] PGD 8000000827923067 P4D 8000000827923067 PUD 827924067 PMD 0 [ 212.926302] Oops: 0002 [#1] SMP KASAN PTI [ 212.926539] CPU: 18 PID: 2617 Comm: tc Tainted: G B 5.2.0+ #512 [ 212.926938] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017 [ 212.927364] RIP: 0010:tc_new_tfilter+0x698/0xd40 [ 212.927633] Code: 74 0d 48 85 c0 74 08 48 89 ef e8 03 aa 62 00 48 8b 84 24 a0 00 00 00 48 8d 78 10 48 89 44 24 18 e8 4d 0c 6b ff 48 8b 44 24 18 <83> 60 10 f b 48 85 ed 0f 85 3d fe ff ff e9 4f fe ff ff e8 81 26 f8 [ 212.928607] RSP: 0018:ffff88884fd5f5d8 EFLAGS: 00010296 [ 212.928905] RAX: 0000000000000000 RBX: 0000000000000000 RCX: dffffc0000000000 [ 212.929201] RDX: 0000000000000007 RSI: 0000000000000004 RDI: 0000000000000297 [ 212.929402] RBP: ffff88886bedd600 R08: ffffffffb91d4b51 R09: fffffbfff7616e4d [ 212.929609] R10: fffffbfff7616e4c R11: ffffffffbb0b7263 R12: ffff88886bc61040 [ 212.929803] R13: ffff88884fd5f950 R14: ffffc900039c5000 R15: ffff88835e927680 [ 212.929999] FS: 00007fe7c50b6480(0000) GS:ffff88886f980000(0000) knlGS:0000000000000000 [ 212.930235] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 212.930394] CR2: 0000000000000010 CR3: 000000085bd04002 CR4: 00000000001606e0 [ 212.930588] Call Trace: [ 212.930682] ? tc_del_tfilter+0xa40/0xa40 [ 212.930811] ? __lock_acquire+0x5b5/0x2460 [ 212.930948] ? find_held_lock+0x85/0xa0 [ 212.931081] ? tc_del_tfilter+0xa40/0xa40 [ 212.931201] rtnetlink_rcv_msg+0x4ab/0x5f0 [ 212.931332] ? rtnl_dellink+0x490/0x490 [ 212.931454] ? lockdep_hardirqs_on+0x260/0x260 [ 212.931589] ? netlink_deliver_tap+0xab/0x5a0 [ 212.931717] ? match_held_lock+0x1b/0x240 [ 212.931844] netlink_rcv_skb+0xd0/0x200 [ 212.931958] ? rtnl_dellink+0x490/0x490 [ 212.932079] ? netlink_ack+0x440/0x440 [ 212.932205] ? netlink_deliver_tap+0x161/0x5a0 [ 212.932335] ? lock_downgrade+0x360/0x360 [ 212.932457] ? lock_acquire+0xe5/0x210 [ 212.932579] netlink_unicast+0x296/0x350 [ 212.932705] ? netlink_attachskb+0x390/0x390 [ 212.932834] ? _copy_from_iter_full+0xe0/0x3a0 [ 212.932976] netlink_sendmsg+0x394/0x600 [ 212.937998] ? netlink_unicast+0x350/0x350 [ 212.943033] ? move_addr_to_kernel.part.0+0x90/0x90 [ 212.948115] ? netlink_unicast+0x350/0x350 [ 212.953185] sock_sendmsg+0x96/0xa0 [ 212.958099] ___sys_sendmsg+0x482/0x520 [ 212.962881] ? match_held_lock+0x1b/0x240 [ 212.967618] ? copy_msghdr_from_user+0x250/0x250 [ 212.972337] ? lock_downgrade+0x360/0x360 [ 212.976973] ? rwlock_bug.part.0+0x60/0x60 [ 212.981548] ? __mod_node_page_state+0x1f/0xa0 [ 212.986060] ? match_held_lock+0x1b/0x240 [ 212.990567] ? find_held_lock+0x85/0xa0 [ 212.994989] ? do_user_addr_fault+0x349/0x5b0 [ 212.999387] ? lock_downgrade+0x360/0x360 [ 213.003713] ? find_held_lock+0x85/0xa0 [ 213.007972] ? __fget_light+0xa1/0xf0 [ 213.012143] ? sockfd_lookup_light+0x91/0xb0 [ 213.016165] __sys_sendmsg+0xba/0x130 [ 213.020040] ? __sys_sendmsg_sock+0xb0/0xb0 [ 213.023870] ? handle_mm_fault+0x337/0x470 [ 213.027592] ? page_fault+0x8/0x30 [ 213.031316] ? lockdep_hardirqs_off+0xbe/0x100 [ 213.034999] ? mark_held_locks+0x24/0x90 [ 213.038671] ? do_syscall_64+0x1e/0xe0 [ 213.042297] do_syscall_64+0x74/0xe0 [ 213.045828] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 213.049354] RIP: 0033:0x7fe7c527c7b8 [ 213.052792] Code: 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 65 8f 0c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f 0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 89 54 [ 213.060269] RSP: 002b:00007ffc3f7908a8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 213.064144] RAX: ffffffffffffffda RBX: 000000005d34716f RCX: 00007fe7c527c7b8 [ 213.068094] RDX: 0000000000000000 RSI: 00007ffc3f790910 RDI: 0000000000000003 [ 213.072109] RBP: 0000000000000000 R08: 0000000000000001 R09: 00007fe7c5340cc0 [ 213.076113] R10: 0000000000404ec2 R11: 0000000000000246 R12: 0000000000000080 [ 213.080146] R13: 0000000000480640 R14: 0000000000000080 R15: 0000000000000000 [ 213.084147] Modules linked in: act_gact cls_flower sch_ingress nfsv3 nfs_acl nfs lockd grace fscache bridge stp llc sunrpc intel_rapl_msr intel_rapl_common [<1;69;32Msb_edac rdma_ucm rdma_cm x86_pkg_temp_thermal iw_cm intel_powerclamp ib_cm coretemp kvm_intel kvm irqbypass mlx5_ib ib_uverbs ib_core crct10dif_pclmul crc32_pc lmul crc32c_intel ghash_clmulni_intel mlx5_core intel_cstate intel_uncore iTCO_wdt igb iTCO_vendor_support mlxfw mei_me ptp ses intel_rapl_perf mei pcspkr ipmi _ssif i2c_i801 joydev enclosure pps_core lpc_ich ioatdma wmi dca ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad ast i2c_algo_bit drm_vram_helpe r ttm drm_kms_helper drm mpt3sas raid_class scsi_transport_sas [ 213.112326] CR2: 0000000000000010 [ 213.117429] ---[ end trace adb58eb0a4ee6283 ]--- Verify that q pointer is not NULL before setting the 'flags' field. Fixes: 3f05e6886a59 ("net_sched: unset TCQ_F_CAN_BYPASS when adding filters") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-20net: flow_offload: add flow_block structure and use itPablo Neira Ayuso1-3/+7
This object stores the flow block callbacks that are attached to this block. Update flow_block_cb_lookup() to take this new object. This patch restores the block sharing feature. Fixes: da3eeb904ff4 ("net: flow_offload: add list handling functions") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-20net: flow_offload: rename tc_setup_cb_t to flow_setup_cb_tPablo Neira Ayuso5-7/+7
Rename this type definition and adapt users. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-17net_sched: unset TCQ_F_CAN_BYPASS when adding filtersCong Wang3-4/+1
For qdisc's that support TC filters and set TCQ_F_CAN_BYPASS, notably fq_codel, it makes no sense to let packets bypass the TC filters we setup in any scenario, otherwise our packets steering policy could not be enforced. This can be reproduced easily with the following script: ip li add dev dummy0 type dummy ifconfig dummy0 up tc qd add dev dummy0 root fq_codel tc filter add dev dummy0 parent 8001: protocol arp basic action mirred egress redirect dev lo tc filter add dev dummy0 parent 8001: protocol ip basic action mirred egress redirect dev lo ping -I dummy0 192.168.112.1 Without this patch, packets are sent directly to dummy0 without hitting any of the filters. With this patch, packets are redirected to loopback as expected. This fix is not perfect, it only unsets the flag but does not set it back because we have to save the information somewhere in the qdisc if we really want that. Note, both fq_codel and sfq clear this flag in their ->bind_tcf() but this is clearly not sufficient when we don't use any class ID. Fixes: 23624935e0c4 ("net_sched: TCQ_F_CAN_BYPASS generalization") Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-17net/sched: Make NET_ACT_CT depends on NF_NATYueHaibing1-1/+1
If NF_NAT is m and NET_ACT_CT is y, build fails: net/sched/act_ct.o: In function `tcf_ct_act': act_ct.c:(.text+0x21ac): undefined reference to `nf_ct_nat_ext_add' act_ct.c:(.text+0x229a): undefined reference to `nf_nat_icmp_reply_translation' act_ct.c:(.text+0x233a): undefined reference to `nf_nat_setup_info' act_ct.c:(.text+0x234a): undefined reference to `nf_nat_alloc_null_binding' act_ct.c:(.text+0x237c): undefined reference to `nf_nat_packet' Reported-by: Hulk Robot <hulkci@huawei.com> Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-17fix: taprio: Change type of txtime-delay parameter to u32Vedang Patel1-3/+3
During the review of the iproute2 patches for txtime-assist mode, it was pointed out that it does not make sense for the txtime-delay parameter to be negative. So, change the type of the parameter from s32 to u32. Fixes: 4cfd5779bd6e ("taprio: Add support for txtime-assist mode") Reported-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Vedang Patel <vedang.patel@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-13net: sched: Fix NULL-pointer dereference in tc_indr_block_ing_cmd()Vlad Buslov1-1/+1
After recent refactoring of block offlads infrastructure, indr_dev->block pointer is dereferenced before it is verified to be non-NULL. Example stack trace where this behavior leads to NULL-pointer dereference error when creating vxlan dev on system with mlx5 NIC with offloads enabled: [ 1157.852938] ================================================================== [ 1157.866877] BUG: KASAN: null-ptr-deref in tc_indr_block_ing_cmd.isra.41+0x9c/0x160 [ 1157.880877] Read of size 4 at addr 0000000000000090 by task ip/3829 [ 1157.901637] CPU: 22 PID: 3829 Comm: ip Not tainted 5.2.0-rc6+ #488 [ 1157.914438] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017 [ 1157.929031] Call Trace: [ 1157.938318] dump_stack+0x9a/0xeb [ 1157.948362] ? tc_indr_block_ing_cmd.isra.41+0x9c/0x160 [ 1157.960262] ? tc_indr_block_ing_cmd.isra.41+0x9c/0x160 [ 1157.972082] __kasan_report+0x176/0x192 [ 1157.982513] ? tc_indr_block_ing_cmd.isra.41+0x9c/0x160 [ 1157.994348] kasan_report+0xe/0x20 [ 1158.004324] tc_indr_block_ing_cmd.isra.41+0x9c/0x160 [ 1158.015950] ? tcf_block_setup+0x430/0x430 [ 1158.026558] ? kasan_unpoison_shadow+0x30/0x40 [ 1158.037464] __tc_indr_block_cb_register+0x5f5/0xf20 [ 1158.049288] ? mlx5e_rep_indr_tc_block_unbind+0xa0/0xa0 [mlx5_core] [ 1158.062344] ? tc_indr_block_dev_put.part.47+0x5c0/0x5c0 [ 1158.074498] ? rdma_roce_rescan_device+0x20/0x20 [ib_core] [ 1158.086580] ? br_device_event+0x98/0x480 [bridge] [ 1158.097870] ? strcmp+0x30/0x50 [ 1158.107578] mlx5e_nic_rep_netdevice_event+0xdd/0x180 [mlx5_core] [ 1158.120212] notifier_call_chain+0x6d/0xa0 [ 1158.130753] register_netdevice+0x6fc/0x7e0 [ 1158.141322] ? netdev_change_features+0xa0/0xa0 [ 1158.152218] ? vxlan_config_apply+0x210/0x310 [vxlan] [ 1158.163593] __vxlan_dev_create+0x2ad/0x520 [vxlan] [ 1158.174770] ? vxlan_changelink+0x490/0x490 [vxlan] [ 1158.185870] ? rcu_read_unlock+0x60/0x60 [vxlan] [ 1158.196798] vxlan_newlink+0x99/0xf0 [vxlan] [ 1158.207303] ? __vxlan_dev_create+0x520/0x520 [vxlan] [ 1158.218601] ? rtnl_create_link+0x3d0/0x450 [ 1158.228900] __rtnl_newlink+0x8a7/0xb00 [ 1158.238701] ? stack_access_ok+0x35/0x80 [ 1158.248450] ? rtnl_link_unregister+0x1a0/0x1a0 [ 1158.258735] ? find_held_lock+0x6d/0xd0 [ 1158.268379] ? is_bpf_text_address+0x67/0xf0 [ 1158.278330] ? lock_acquire+0xc1/0x1f0 [ 1158.287686] ? is_bpf_text_address+0x5/0xf0 [ 1158.297449] ? is_bpf_text_address+0x86/0xf0 [ 1158.307310] ? kernel_text_address+0xec/0x100 [ 1158.317155] ? arch_stack_walk+0x92/0xe0 [ 1158.326497] ? __kernel_text_address+0xe/0x30 [ 1158.336213] ? unwind_get_return_address+0x2f/0x50 [ 1158.346267] ? create_prof_cpu_mask+0x20/0x20 [ 1158.355936] ? arch_stack_walk+0x92/0xe0 [ 1158.365117] ? stack_trace_save+0x8a/0xb0 [ 1158.374272] ? stack_trace_consume_entry+0x80/0x80 [ 1158.384226] ? match_held_lock+0x33/0x210 [ 1158.393216] ? kasan_unpoison_shadow+0x30/0x40 [ 1158.402593] rtnl_newlink+0x53/0x80 [ 1158.410925] rtnetlink_rcv_msg+0x3a5/0x600 [ 1158.419777] ? validate_linkmsg+0x400/0x400 [ 1158.428620] ? find_held_lock+0x6d/0xd0 [ 1158.437117] ? match_held_lock+0x1b/0x210 [ 1158.445760] ? validate_linkmsg+0x400/0x400 [ 1158.454642] netlink_rcv_skb+0xc7/0x1f0 [ 1158.463150] ? netlink_ack+0x470/0x470 [ 1158.471538] ? netlink_deliver_tap+0x1f3/0x5a0 [ 1158.480607] netlink_unicast+0x2ae/0x350 [ 1158.489099] ? netlink_attachskb+0x340/0x340 [ 1158.497935] ? _copy_from_iter_full+0xde/0x3b0 [ 1158.506945] ? __virt_addr_valid+0xb6/0xf0 [ 1158.515578] ? __check_object_size+0x159/0x240 [ 1158.524515] netlink_sendmsg+0x4d3/0x630 [ 1158.532879] ? netlink_unicast+0x350/0x350 [ 1158.541400] ? netlink_unicast+0x350/0x350 [ 1158.549805] sock_sendmsg+0x94/0xa0 [ 1158.557561] ___sys_sendmsg+0x49d/0x570 [ 1158.565625] ? copy_msghdr_from_user+0x210/0x210 [ 1158.574457] ? __fput+0x1e2/0x330 [ 1158.581948] ? __kasan_slab_free+0x130/0x180 [ 1158.590407] ? kmem_cache_free+0xb6/0x2d0 [ 1158.598574] ? mark_lock+0xc7/0x790 [ 1158.606177] ? task_work_run+0xcf/0x100 [ 1158.614165] ? exit_to_usermode_loop+0x102/0x110 [ 1158.622954] ? __lock_acquire+0x963/0x1ee0 [ 1158.631199] ? lockdep_hardirqs_on+0x260/0x260 [ 1158.639777] ? match_held_lock+0x1b/0x210 [ 1158.647918] ? lockdep_hardirqs_on+0x260/0x260 [ 1158.656501] ? match_held_lock+0x1b/0x210 [ 1158.664643] ? __fget_light+0xa6/0xe0 [ 1158.672423] ? __sys_sendmsg+0xd2/0x150 [ 1158.680334] __sys_sendmsg+0xd2/0x150 [ 1158.688063] ? __ia32_sys_shutdown+0x30/0x30 [ 1158.696435] ? lock_downgrade+0x2e0/0x2e0 [ 1158.704541] ? mark_held_locks+0x1a/0x90 [ 1158.712611] ? mark_held_locks+0x1a/0x90 [ 1158.720619] ? do_syscall_64+0x1e/0x2c0 [ 1158.728530] do_syscall_64+0x78/0x2c0 [ 1158.736254] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 1158.745414] RIP: 0033:0x7f62d505cb87 [ 1158.753070] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 80 00 00 00 00 8b 05 6a 2b 2c 00 48 63 d2 48 63 ff 85 c0 75 18 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 59 f3 c3 0f 1f 80 00 00[87/1817] 48 89 f3 48 [ 1158.780924] RSP: 002b:00007fffd9832268 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 1158.793204] RAX: ffffffffffffffda RBX: 000000005d26048f RCX: 00007f62d505cb87 [ 1158.805111] RDX: 0000000000000000 RSI: 00007fffd98322d0 RDI: 0000000000000003 [ 1158.817055] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000006 [ 1158.828987] R10: 00007f62d50ce260 R11: 0000000000000246 R12: 0000000000000001 [ 1158.840909] R13: 000000000067e540 R14: 0000000000000000 R15: 000000000067ed20 [ 1158.852873] ================================================================== Introduce new function tcf_block_non_null_shared() that verifies block pointer before dereferencing it to obtain index. Use the function in tc_indr_block_ing_cmd() to prevent NULL pointer dereference. Fixes: 955bcb6ea0df ("drivers: net: use flow block API") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>