summaryrefslogtreecommitdiff
path: root/net/sched
AgeCommit message (Collapse)AuthorFilesLines
2021-07-19net: sched: fix error return code in tcf_del_walker()Yang Yingliang1-1/+2
[ Upstream commit 55d96f72e8ddc0a294e0b9c94016edbb699537e1 ] When nla_put_u32() fails, 'ret' could be 0, it should return error code in tcf_del_walker(). Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14net: sched: fix warning in tcindex_alloc_perfect_hashPavel Skripkin1-1/+1
[ Upstream commit 3f2db250099f46988088800052cdf2332c7aba61 ] Syzbot reported warning in tcindex_alloc_perfect_hash. The problem was in too big cp->hash, which triggers warning in kmalloc. Since cp->hash comes from userspace, there is no need to warn if value is not correct Fixes: b9a24bb76bf6 ("net_sched: properly handle failure case of tcf_exts_init()") Reported-and-tested-by: syzbot+1071ad60cd7df39fdadb@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Acked-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14pkt_sched: sch_qfq: fix qfq_change_class() error pathEric Dumazet1-5/+3
[ Upstream commit 0cd58e5c53babb9237b741dbef711f0a9eb6d3fd ] If qfq_change_class() is unable to allocate memory for qfq_aggregate, it frees the class that has been inserted in the class hash table, but does not unhash it. Defer the insertion after the problematic allocation. BUG: KASAN: use-after-free in hlist_add_head include/linux/list.h:884 [inline] BUG: KASAN: use-after-free in qdisc_class_hash_insert+0x200/0x210 net/sched/sch_api.c:731 Write of size 8 at addr ffff88814a534f10 by task syz-executor.4/31478 CPU: 0 PID: 31478 Comm: syz-executor.4 Not tainted 5.13.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x141/0x1d7 lib/dump_stack.c:120 print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:233 __kasan_report mm/kasan/report.c:419 [inline] kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:436 hlist_add_head include/linux/list.h:884 [inline] qdisc_class_hash_insert+0x200/0x210 net/sched/sch_api.c:731 qfq_change_class+0x96c/0x1990 net/sched/sch_qfq.c:489 tc_ctl_tclass+0x514/0xe50 net/sched/sch_api.c:2113 rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5564 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504 netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340 netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1929 sock_sendmsg_nosec net/socket.c:654 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:674 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2350 ___sys_sendmsg+0xf3/0x170 net/socket.c:2404 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2433 do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x4665d9 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fdc7b5f0188 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 000000000056bf80 RCX: 00000000004665d9 RDX: 0000000000000000 RSI: 00000000200001c0 RDI: 0000000000000003 RBP: 00007fdc7b5f01d0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002 R13: 00007ffcf7310b3f R14: 00007fdc7b5f0300 R15: 0000000000022000 Allocated by task 31445: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38 kasan_set_track mm/kasan/common.c:46 [inline] set_alloc_info mm/kasan/common.c:428 [inline] ____kasan_kmalloc mm/kasan/common.c:507 [inline] ____kasan_kmalloc mm/kasan/common.c:466 [inline] __kasan_kmalloc+0x9b/0xd0 mm/kasan/common.c:516 kmalloc include/linux/slab.h:556 [inline] kzalloc include/linux/slab.h:686 [inline] qfq_change_class+0x705/0x1990 net/sched/sch_qfq.c:464 tc_ctl_tclass+0x514/0xe50 net/sched/sch_api.c:2113 rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5564 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504 netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340 netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1929 sock_sendmsg_nosec net/socket.c:654 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:674 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2350 ___sys_sendmsg+0xf3/0x170 net/socket.c:2404 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2433 do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47 entry_SYSCALL_64_after_hwframe+0x44/0xae Freed by task 31445: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38 kasan_set_track+0x1c/0x30 mm/kasan/common.c:46 kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:357 ____kasan_slab_free mm/kasan/common.c:360 [inline] ____kasan_slab_free mm/kasan/common.c:325 [inline] __kasan_slab_free+0xfb/0x130 mm/kasan/common.c:368 kasan_slab_free include/linux/kasan.h:212 [inline] slab_free_hook mm/slub.c:1583 [inline] slab_free_freelist_hook+0xdf/0x240 mm/slub.c:1608 slab_free mm/slub.c:3168 [inline] kfree+0xe5/0x7f0 mm/slub.c:4212 qfq_change_class+0x10fb/0x1990 net/sched/sch_qfq.c:518 tc_ctl_tclass+0x514/0xe50 net/sched/sch_api.c:2113 rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5564 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504 netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340 netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1929 sock_sendmsg_nosec net/socket.c:654 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:674 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2350 ___sys_sendmsg+0xf3/0x170 net/socket.c:2404 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2433 do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47 entry_SYSCALL_64_after_hwframe+0x44/0xae The buggy address belongs to the object at ffff88814a534f00 which belongs to the cache kmalloc-128 of size 128 The buggy address is located 16 bytes inside of 128-byte region [ffff88814a534f00, ffff88814a534f80) The buggy address belongs to the page: page:ffffea0005294d00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x14a534 flags: 0x57ff00000000200(slab|node=1|zone=2|lastcpupid=0x7ff) raw: 057ff00000000200 ffffea00004fee00 0000000600000006 ffff8880110418c0 raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected page_owner tracks the page as allocated page last allocated via order 0, migratetype Unmovable, gfp_mask 0x12cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY), pid 29797, ts 604817765317, free_ts 604810151744 prep_new_page mm/page_alloc.c:2358 [inline] get_page_from_freelist+0x1033/0x2b60 mm/page_alloc.c:3994 __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5200 alloc_pages+0x18c/0x2a0 mm/mempolicy.c:2272 alloc_slab_page mm/slub.c:1646 [inline] allocate_slab+0x2c5/0x4c0 mm/slub.c:1786 new_slab mm/slub.c:1849 [inline] new_slab_objects mm/slub.c:2595 [inline] ___slab_alloc+0x4a1/0x810 mm/slub.c:2758 __slab_alloc.constprop.0+0xa7/0xf0 mm/slub.c:2798 slab_alloc_node mm/slub.c:2880 [inline] slab_alloc mm/slub.c:2922 [inline] __kmalloc+0x315/0x330 mm/slub.c:4050 kmalloc include/linux/slab.h:561 [inline] kzalloc include/linux/slab.h:686 [inline] __register_sysctl_table+0x112/0x1090 fs/proc/proc_sysctl.c:1318 mpls_dev_sysctl_register+0x1b7/0x2d0 net/mpls/af_mpls.c:1421 mpls_add_dev net/mpls/af_mpls.c:1472 [inline] mpls_dev_notify+0x214/0x8b0 net/mpls/af_mpls.c:1588 notifier_call_chain+0xb5/0x200 kernel/notifier.c:83 call_netdevice_notifiers_info+0xb5/0x130 net/core/dev.c:2121 call_netdevice_notifiers_extack net/core/dev.c:2133 [inline] call_netdevice_notifiers net/core/dev.c:2147 [inline] register_netdevice+0x106b/0x1500 net/core/dev.c:10312 veth_newlink+0x585/0xac0 drivers/net/veth.c:1547 __rtnl_newlink+0x1062/0x1710 net/core/rtnetlink.c:3452 rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3500 page last free stack trace: reset_page_owner include/linux/page_owner.h:24 [inline] free_pages_prepare mm/page_alloc.c:1298 [inline] free_pcp_prepare+0x223/0x300 mm/page_alloc.c:1342 free_unref_page_prepare mm/page_alloc.c:3250 [inline] free_unref_page+0x12/0x1d0 mm/page_alloc.c:3298 __vunmap+0x783/0xb60 mm/vmalloc.c:2566 free_work+0x58/0x70 mm/vmalloc.c:80 process_one_work+0x98d/0x1600 kernel/workqueue.c:2276 worker_thread+0x64c/0x1120 kernel/workqueue.c:2422 kthread+0x3b1/0x4a0 kernel/kthread.c:313 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 Memory state around the buggy address: ffff88814a534e00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88814a534e80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff88814a534f00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff88814a534f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88814a535000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Fixes: 462dbc9101acd ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14net/sched: act_vlan: Fix modify to allow 0Boris Sukholitko1-2/+5
[ Upstream commit 9c5eee0afca09cbde6bd00f77876754aaa552970 ] Currently vlan modification action checks existence of vlan priority by comparing it to 0. Therefore it is impossible to modify existing vlan tag to have priority 0. For example, the following tc command will change the vlan id but will not affect vlan priority: tc filter add dev eth1 ingress matchall action vlan modify id 300 \ priority 0 pipe mirred egress redirect dev eth2 The incoming packet on eth1: ethertype 802.1Q (0x8100), vlan 200, p 4, ethertype IPv4 will be changed to: ethertype 802.1Q (0x8100), vlan 300, p 4, ethertype IPv4 although the user has intended to have p == 0. The fix is to add tcfv_push_prio_exists flag to struct tcf_vlan_params and rely on it when deciding to set the priority. Fixes: 45a497f2d149a4a8061c (net/sched: act_vlan: Introduce TCA_VLAN_ACT_MODIFY vlan action) Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-23sch_cake: Fix out of bounds when parsing TCP options and headerMaxim Mikityanskiy1-1/+5
[ Upstream commit ba91c49dedbde758ba0b72f57ac90b06ddf8e548 ] The TCP option parser in cake qdisc (cake_get_tcpopt and cake_tcph_may_drop) could read one byte out of bounds. When the length is 1, the execution flow gets into the loop, reads one byte of the opcode, and if the opcode is neither TCPOPT_EOL nor TCPOPT_NOP, it reads one more byte, which exceeds the length of 1. This fix is inspired by commit 9609dad263f8 ("ipv4: tcp_input: fix stack out of bounds when parsing TCP options."). v2 changes: Added doff validation in cake_get_tcphdr to avoid parsing garbage as TCP header. Although it wasn't strictly an out-of-bounds access (memory was allocated), garbage values could be read where CAKE expected the TCP header if doff was smaller than 5. Cc: Young Xiao <92siuyang@gmail.com> Fixes: 8b7138814f29 ("sch_cake: Add optional ACK filter") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-23net/sched: act_ct: handle DNAT tuple collisionMarcelo Ricardo Leitner1-8/+13
[ Upstream commit 13c62f5371e3eb4fc3400cfa26e64ca75f888008 ] This this the counterpart of 8aa7b526dc0b ("openvswitch: handle DNAT tuple collision") for act_ct. From that commit changelog: """ With multiple DNAT rules it's possible that after destination translation the resulting tuples collide. ... Netfilter handles this case by allocating a null binding for SNAT at egress by default. Perform the same operation in openvswitch for DNAT if no explicit SNAT is requested by the user and allocate a null binding for SNAT for packets in the "original" direction. """ Fixes: 95219afbb980 ("act_ct: support asymmetric conntrack") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-10net/sched: act_ct: Fix ct template allocation for zone 0Ariel Levkovich1-3/+0
[ Upstream commit fb91702b743dec78d6507c53a2dec8a8883f509d ] Fix current behavior of skipping template allocation in case the ct action is in zone 0. Skipping the allocation may cause the datapath ct code to ignore the entire ct action with all its attributes (commit, nat) in case the ct action in zone 0 was preceded by a ct clear action. The ct clear action sets the ct_state to untracked and resets the skb->_nfct pointer. Under these conditions and without an allocated ct template, the skb->_nfct pointer will remain NULL which will cause the tc ct action handler to exit without handling commit and nat actions, if such exist. For example, the following rule in OVS dp: recirc_id(0x2),ct_state(+new-est-rel-rpl+trk),ct_label(0/0x1), \ in_port(eth0),actions:ct_clear,ct(commit,nat(src=10.11.0.12)), \ recirc(0x37a) Will result in act_ct skipping the commit and nat actions in zone 0. The change removes the skipping of template allocation for zone 0 and treats it the same as any other zone. Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct") Signed-off-by: Ariel Levkovich <lariel@nvidia.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Link: https://lore.kernel.org/r/20210526170110.54864-1-lariel@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-03sch_dsmark: fix a NULL deref in qdisc_reset()Taehee Yoo1-1/+2
[ Upstream commit 9b76eade16423ef06829cccfe3e100cfce31afcd ] If Qdisc_ops->init() is failed, Qdisc_ops->reset() would be called. When dsmark_init(Qdisc_ops->init()) is failed, it possibly doesn't initialize dsmark_qdisc_data->q. But dsmark_reset(Qdisc_ops->reset()) uses dsmark_qdisc_data->q pointer wihtout any null checking. So, panic would occur. Test commands: sysctl net.core.default_qdisc=dsmark -w ip link add dummy0 type dummy ip link add vw0 link dummy0 type virt_wifi ip link set vw0 up Splat looks like: KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f] CPU: 3 PID: 684 Comm: ip Not tainted 5.12.0+ #910 RIP: 0010:qdisc_reset+0x2b/0x680 Code: 1f 44 00 00 48 b8 00 00 00 00 00 fc ff df 41 57 41 56 41 55 41 54 55 48 89 fd 48 83 c7 18 53 48 89 fa 48 c1 ea 03 48 83 ec 20 <80> 3c 02 00 0f 85 09 06 00 00 4c 8b 65 18 0f 1f 44 00 00 65 8b 1d RSP: 0018:ffff88800fda6bf8 EFLAGS: 00010282 RAX: dffffc0000000000 RBX: ffff8880050ed800 RCX: 0000000000000000 RDX: 0000000000000003 RSI: ffffffff99e34100 RDI: 0000000000000018 RBP: 0000000000000000 R08: fffffbfff346b553 R09: fffffbfff346b553 R10: 0000000000000001 R11: fffffbfff346b552 R12: ffffffffc0824940 R13: ffff888109e83800 R14: 00000000ffffffff R15: ffffffffc08249e0 FS: 00007f5042287680(0000) GS:ffff888119800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055ae1f4dbd90 CR3: 0000000006760002 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ? rcu_read_lock_bh_held+0xa0/0xa0 dsmark_reset+0x3d/0xf0 [sch_dsmark] qdisc_reset+0xa9/0x680 qdisc_destroy+0x84/0x370 qdisc_create_dflt+0x1fe/0x380 attach_one_default_qdisc.constprop.41+0xa4/0x180 dev_activate+0x4d5/0x8c0 ? __dev_open+0x268/0x390 __dev_open+0x270/0x390 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-03net: sched: fix tx action reschedule issue with stopped queueYunsheng Lin1-1/+26
[ Upstream commit dcad9ee9e0663d74a89b25b987f9c7be86432812 ] The netdev qeueue might be stopped when byte queue limit has reached or tx hw ring is full, net_tx_action() may still be rescheduled if STATE_MISSED is set, which consumes unnecessary cpu without dequeuing and transmiting any skb because the netdev queue is stopped, see qdisc_run_end(). This patch fixes it by checking the netdev queue state before calling qdisc_run() and clearing STATE_MISSED if netdev queue is stopped during qdisc_run(), the net_tx_action() is rescheduled again when netdev qeueue is restarted, see netif_tx_wake_queue(). As there is time window between netif_xmit_frozen_or_stopped() checking and STATE_MISSED clearing, between which STATE_MISSED may set by net_tx_action() scheduled by netif_tx_wake_queue(), so set the STATE_MISSED again if netdev queue is restarted. Fixes: 6b3ba9146fe6 ("net: sched: allow qdiscs to handle locking") Reported-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-03net: sched: fix tx action rescheduling issue during deactivationYunsheng Lin1-1/+3
[ Upstream commit 102b55ee92f9fda4dde7a45d2b20538e6e3e3d1e ] Currently qdisc_run() checks the STATE_DEACTIVATED of lockless qdisc before calling __qdisc_run(), which ultimately clear the STATE_MISSED when all the skb is dequeued. If STATE_DEACTIVATED is set before clearing STATE_MISSED, there may be rescheduling of net_tx_action() at the end of qdisc_run_end(), see below: CPU0(net_tx_atcion) CPU1(__dev_xmit_skb) CPU2(dev_deactivate) . . . . set STATE_MISSED . . __netif_schedule() . . . set STATE_DEACTIVATED . . qdisc_reset() . . . .<--------------- . synchronize_net() clear __QDISC_STATE_SCHED | . . . | . . . | . some_qdisc_is_busy() . | . return *false* . | . . test STATE_DEACTIVATED | . . __qdisc_run() *not* called | . . . | . . test STATE_MISS | . . __netif_schedule()--------| . . . . . . . . __qdisc_run() is not called by net_tx_atcion() in CPU0 because CPU2 has set STATE_DEACTIVATED flag during dev_deactivate(), and STATE_MISSED is only cleared in __qdisc_run(), __netif_schedule is called at the end of qdisc_run_end(), causing tx action rescheduling problem. qdisc_run() called by net_tx_action() runs in the softirq context, which should has the same semantic as the qdisc_run() called by __dev_xmit_skb() protected by rcu_read_lock_bh(). And there is a synchronize_net() between STATE_DEACTIVATED flag being set and qdisc_reset()/some_qdisc_is_busy in dev_deactivate(), we can safely bail out for the deactived lockless qdisc in net_tx_action(), and qdisc_reset() will reset all skb not dequeued yet. So add the rcu_read_lock() explicitly to protect the qdisc_run() and do the STATE_DEACTIVATED checking in net_tx_action() before calling qdisc_run_begin(). Another option is to do the checking in the qdisc_run_end(), but it will add unnecessary overhead for non-tx_action case, because __dev_queue_xmit() will not see qdisc with STATE_DEACTIVATED after synchronize_net(), the qdisc with STATE_DEACTIVATED can only be seen by net_tx_action() because of __netif_schedule(). The STATE_DEACTIVATED checking in qdisc_run() is to avoid race between net_tx_action() and qdisc_reset(), see: commit d518d2ed8640 ("net/sched: fix race between deactivation and dequeue for NOLOCK qdisc"). As the bailout added above for deactived lockless qdisc in net_tx_action() provides better protection for the race without calling qdisc_run() at all, so remove the STATE_DEACTIVATED checking in qdisc_run(). After qdisc_reset(), there is no skb in qdisc to be dequeued, so clear the STATE_MISSED in dev_reset_queue() too. Fixes: 6b3ba9146fe6 ("net: sched: allow qdiscs to handle locking") Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> V8: Clearing STATE_MISSED before calling __netif_schedule() has avoid the endless rescheduling problem, but there may still be a unnecessary rescheduling, so adjust the commit log. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-03net: sched: fix packet stuck problem for lockless qdiscYunsheng Lin1-0/+19
[ Upstream commit a90c57f2cedd52a511f739fb55e6244e22e1a2fb ] Lockless qdisc has below concurrent problem: cpu0 cpu1 . . q->enqueue . . . qdisc_run_begin() . . . dequeue_skb() . . . sch_direct_xmit() . . . . q->enqueue . qdisc_run_begin() . return and do nothing . . qdisc_run_end() . cpu1 enqueue a skb without calling __qdisc_run() because cpu0 has not released the lock yet and spin_trylock() return false for cpu1 in qdisc_run_begin(), and cpu0 do not see the skb enqueued by cpu1 when calling dequeue_skb() because cpu1 may enqueue the skb after cpu0 calling dequeue_skb() and before cpu0 calling qdisc_run_end(). Lockless qdisc has below another concurrent problem when tx_action is involved: cpu0(serving tx_action) cpu1 cpu2 . . . . q->enqueue . . qdisc_run_begin() . . dequeue_skb() . . . q->enqueue . . . . sch_direct_xmit() . . . qdisc_run_begin() . . return and do nothing . . . clear __QDISC_STATE_SCHED . . qdisc_run_begin() . . return and do nothing . . . . . . qdisc_run_end() . This patch fixes the above data race by: 1. If the first spin_trylock() return false and STATE_MISSED is not set, set STATE_MISSED and retry another spin_trylock() in case other CPU may not see STATE_MISSED after it releases the lock. 2. reschedule if STATE_MISSED is set after the lock is released at the end of qdisc_run_end(). For tx_action case, STATE_MISSED is also set when cpu1 is at the end if qdisc_run_end(), so tx_action will be rescheduled again to dequeue the skb enqueued by cpu2. Clear STATE_MISSED before retrying a dequeuing when dequeuing returns NULL in order to reduce the overhead of the second spin_trylock() and __netif_schedule() calling. Also clear the STATE_MISSED before calling __netif_schedule() at the end of qdisc_run_end() to avoid doing another round of dequeuing in the pfifo_fast_dequeue(). The performance impact of this patch, tested using pktgen and dummy netdev with pfifo_fast qdisc attached: threads without+this_patch with+this_patch delta 1 2.61Mpps 2.60Mpps -0.3% 2 3.97Mpps 3.82Mpps -3.7% 4 5.62Mpps 5.59Mpps -0.5% 8 2.78Mpps 2.77Mpps -0.3% 16 2.22Mpps 2.22Mpps -0.0% Fixes: 6b3ba9146fe6 ("net: sched: allow qdiscs to handle locking") Acked-by: Jakub Kicinski <kuba@kernel.org> Tested-by: Juergen Gross <jgross@suse.com> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19net: sched: tapr: prevent cycle_time == 0 in parse_taprio_scheduleDu Cheng1-0/+6
[ Upstream commit ed8157f1ebf1ae81a8fa2653e3f20d2076fad1c9 ] There is a reproducible sequence from the userland that will trigger a WARN_ON() condition in taprio_get_start_time, which causes kernel to panic if configured as "panic_on_warn". Catch this condition in parse_taprio_schedule to prevent this condition. Reported as bug on syzkaller: https://syzkaller.appspot.com/bug?extid=d50710fd0873a9c6b40c Reported-by: syzbot+d50710fd0873a9c6b40c@syzkaller.appspotmail.com Signed-off-by: Du Cheng <ducheng2@gmail.com> Acked-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-04-14net: sched: bump refcount for new action in ACT replace modeKumar Kartikeya Dwivedi1-0/+3
commit 6855e8213e06efcaf7c02a15e12b1ae64b9a7149 upstream. Currently, action creation using ACT API in replace mode is buggy. When invoking for non-existent action index 42, tc action replace action bpf obj foo.o sec <xyz> index 42 kernel creates the action, fills up the netlink response, and then just deletes the action after notifying userspace. tc action show action bpf doesn't list the action. This happens due to the following sequence when ovr = 1 (replace mode) is enabled: tcf_idr_check_alloc is used to atomically check and either obtain reference for existing action at index, or reserve the index slot using a dummy entry (ERR_PTR(-EBUSY)). This is necessary as pointers to these actions will be held after dropping the idrinfo lock, so bumping the reference count is necessary as we need to insert the actions, and notify userspace by dumping their attributes. Finally, we drop the reference we took using the tcf_action_put_many call in tcf_action_add. However, for the case where a new action is created due to free index, its refcount remains one. This when paired with the put_many call leads to the kernel setting up the action, notifying userspace of its creation, and then tearing it down. For existing actions, the refcount is still held so they remain unaffected. Fortunately due to rtnl_lock serialization requirement, such an action with refcount == 1 will not be concurrently deleted by anything else, at best CLS API can move its refcount up and down by binding to it after it has been published from tcf_idr_insert_many. Since refcount is atleast one until put_many call, CLS API cannot delete it. Also __tcf_action_put release path already ensures deterministic outcome (either new action will be created or existing action will be reused in case CLS API tries to bind to action concurrently) due to idr lock serialization. We fix this by making refcount of newly created actions as 2 in ACT API replace mode. A relaxed store will suffice as visibility is ensured only after the tcf_idr_insert_many call. Note that in case of creation or overwriting using CLS API only (i.e. bind = 1), overwriting existing action object is not allowed, and any such request is silently ignored (without error). The refcount bump that occurs in tcf_idr_check_alloc call there for existing action will pair with tcf_exts_destroy call made from the owner module for the same action. In case of action creation, there is no existing action, so no tcf_exts_destroy callback happens. This means no code changes for CLS API. Fixes: cae422f379f3 ("net: sched: use reference counting action init") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-14net: sched: sch_teql: fix null-pointer dereferencePavel Tikhomirov1-0/+3
commit 1ffbc7ea91606e4abd10eb60de5367f1c86daf5e upstream. Reproduce: modprobe sch_teql tc qdisc add dev teql0 root teql0 This leads to (for instance in Centos 7 VM) OOPS: [ 532.366633] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 [ 532.366733] IP: [<ffffffffc06124a8>] teql_destroy+0x18/0x100 [sch_teql] [ 532.366825] PGD 80000001376d5067 PUD 137e37067 PMD 0 [ 532.366906] Oops: 0000 [#1] SMP [ 532.366987] Modules linked in: sch_teql ... [ 532.367945] CPU: 1 PID: 3026 Comm: tc Kdump: loaded Tainted: G ------------ T 3.10.0-1062.7.1.el7.x86_64 #1 [ 532.368041] Hardware name: Virtuozzo KVM, BIOS 1.11.0-2.vz7.2 04/01/2014 [ 532.368125] task: ffff8b7d37d31070 ti: ffff8b7c9fdbc000 task.ti: ffff8b7c9fdbc000 [ 532.368224] RIP: 0010:[<ffffffffc06124a8>] [<ffffffffc06124a8>] teql_destroy+0x18/0x100 [sch_teql] [ 532.368320] RSP: 0018:ffff8b7c9fdbf8e0 EFLAGS: 00010286 [ 532.368394] RAX: ffffffffc0612490 RBX: ffff8b7cb1565e00 RCX: ffff8b7d35ba2000 [ 532.368476] RDX: ffff8b7d35ba2000 RSI: 0000000000000000 RDI: ffff8b7cb1565e00 [ 532.368557] RBP: ffff8b7c9fdbf8f8 R08: ffff8b7d3fd1f140 R09: ffff8b7d3b001600 [ 532.368638] R10: ffff8b7d3b001600 R11: ffffffff84c7d65b R12: 00000000ffffffd8 [ 532.368719] R13: 0000000000008000 R14: ffff8b7d35ba2000 R15: ffff8b7c9fdbf9a8 [ 532.368800] FS: 00007f6a4e872740(0000) GS:ffff8b7d3fd00000(0000) knlGS:0000000000000000 [ 532.368885] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 532.368961] CR2: 00000000000000a8 CR3: 00000001396ee000 CR4: 00000000000206e0 [ 532.369046] Call Trace: [ 532.369159] [<ffffffff84c8192e>] qdisc_create+0x36e/0x450 [ 532.369268] [<ffffffff846a9b49>] ? ns_capable+0x29/0x50 [ 532.369366] [<ffffffff849afde2>] ? nla_parse+0x32/0x120 [ 532.369442] [<ffffffff84c81b4c>] tc_modify_qdisc+0x13c/0x610 [ 532.371508] [<ffffffff84c693e7>] rtnetlink_rcv_msg+0xa7/0x260 [ 532.372668] [<ffffffff84907b65>] ? sock_has_perm+0x75/0x90 [ 532.373790] [<ffffffff84c69340>] ? rtnl_newlink+0x890/0x890 [ 532.374914] [<ffffffff84c8da7b>] netlink_rcv_skb+0xab/0xc0 [ 532.376055] [<ffffffff84c63708>] rtnetlink_rcv+0x28/0x30 [ 532.377204] [<ffffffff84c8d400>] netlink_unicast+0x170/0x210 [ 532.378333] [<ffffffff84c8d7a8>] netlink_sendmsg+0x308/0x420 [ 532.379465] [<ffffffff84c2f3a6>] sock_sendmsg+0xb6/0xf0 [ 532.380710] [<ffffffffc034a56e>] ? __xfs_filemap_fault+0x8e/0x1d0 [xfs] [ 532.381868] [<ffffffffc034a75c>] ? xfs_filemap_fault+0x2c/0x30 [xfs] [ 532.383037] [<ffffffff847ec23a>] ? __do_fault.isra.61+0x8a/0x100 [ 532.384144] [<ffffffff84c30269>] ___sys_sendmsg+0x3e9/0x400 [ 532.385268] [<ffffffff847f3fad>] ? handle_mm_fault+0x39d/0x9b0 [ 532.386387] [<ffffffff84d88678>] ? __do_page_fault+0x238/0x500 [ 532.387472] [<ffffffff84c31921>] __sys_sendmsg+0x51/0x90 [ 532.388560] [<ffffffff84c31972>] SyS_sendmsg+0x12/0x20 [ 532.389636] [<ffffffff84d8dede>] system_call_fastpath+0x25/0x2a [ 532.390704] [<ffffffff84d8de21>] ? system_call_after_swapgs+0xae/0x146 [ 532.391753] Code: 00 00 00 00 00 00 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 41 55 41 54 53 48 8b b7 48 01 00 00 48 89 fb <48> 8b 8e a8 00 00 00 48 85 c9 74 43 48 89 ca eb 0f 0f 1f 80 00 [ 532.394036] RIP [<ffffffffc06124a8>] teql_destroy+0x18/0x100 [sch_teql] [ 532.395127] RSP <ffff8b7c9fdbf8e0> [ 532.396179] CR2: 00000000000000a8 Null pointer dereference happens on master->slaves dereference in teql_destroy() as master is null-pointer. When qdisc_create() calls teql_qdisc_init() it imediately fails after check "if (m->dev == dev)" because both devices are teql0, and it does not set qdisc_priv(sch)->m leaving it zero on error path, then qdisc_create() imediately calls teql_destroy() which does not expect zero master pointer and we get OOPS. Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation") Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-30net: sched: validate stab valuesEric Dumazet4-7/+11
[ Upstream commit e323d865b36134e8c5c82c834df89109a5c60dab ] iproute2 package is well behaved, but malicious user space can provide illegal shift values and trigger UBSAN reports. Add stab parameter to red_check_params() to validate user input. syzbot reported: UBSAN: shift-out-of-bounds in ./include/net/red.h:312:18 shift exponent 111 is too large for 64-bit type 'long unsigned int' CPU: 1 PID: 14662 Comm: syz-executor.3 Not tainted 5.12.0-rc2-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x141/0x1d7 lib/dump_stack.c:120 ubsan_epilogue+0xb/0x5a lib/ubsan.c:148 __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:327 red_calc_qavg_from_idle_time include/net/red.h:312 [inline] red_calc_qavg include/net/red.h:353 [inline] choke_enqueue.cold+0x18/0x3dd net/sched/sch_choke.c:221 __dev_xmit_skb net/core/dev.c:3837 [inline] __dev_queue_xmit+0x1943/0x2e00 net/core/dev.c:4150 neigh_hh_output include/net/neighbour.h:499 [inline] neigh_output include/net/neighbour.h:508 [inline] ip6_finish_output2+0x911/0x1700 net/ipv6/ip6_output.c:117 __ip6_finish_output net/ipv6/ip6_output.c:182 [inline] __ip6_finish_output+0x4c1/0xe10 net/ipv6/ip6_output.c:161 ip6_finish_output+0x35/0x200 net/ipv6/ip6_output.c:192 NF_HOOK_COND include/linux/netfilter.h:290 [inline] ip6_output+0x1e4/0x530 net/ipv6/ip6_output.c:215 dst_output include/net/dst.h:448 [inline] NF_HOOK include/linux/netfilter.h:301 [inline] NF_HOOK include/linux/netfilter.h:295 [inline] ip6_xmit+0x127e/0x1eb0 net/ipv6/ip6_output.c:320 inet6_csk_xmit+0x358/0x630 net/ipv6/inet6_connection_sock.c:135 dccp_transmit_skb+0x973/0x12c0 net/dccp/output.c:138 dccp_send_reset+0x21b/0x2b0 net/dccp/output.c:535 dccp_finish_passive_close net/dccp/proto.c:123 [inline] dccp_finish_passive_close+0xed/0x140 net/dccp/proto.c:118 dccp_terminate_connection net/dccp/proto.c:958 [inline] dccp_close+0xb3c/0xe60 net/dccp/proto.c:1028 inet_release+0x12e/0x280 net/ipv4/af_inet.c:431 inet6_release+0x4c/0x70 net/ipv6/af_inet6.c:478 __sock_release+0xcd/0x280 net/socket.c:599 sock_close+0x18/0x20 net/socket.c:1258 __fput+0x288/0x920 fs/file_table.c:280 task_work_run+0xdd/0x1a0 kernel/task_work.c:140 tracehook_notify_resume include/linux/tracehook.h:189 [inline] Fixes: 8afa10cbe281 ("net_sched: red: Avoid illegal values") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-17net: sched: avoid duplicates in classes dumpMaximilian Heyne1-4/+4
commit bfc2560563586372212b0a8aeca7428975fa91fe upstream. This is a follow up of commit ea3274695353 ("net: sched: avoid duplicates in qdisc dump") which has fixed the issue only for the qdisc dump. The duplicate printing also occurs when dumping the classes via tc class show dev eth0 Fixes: 59cc1f61f09c ("net: sched: convert qdisc linked list to hashtable") Signed-off-by: Maximilian Heyne <mheyne@amazon.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04net: sched: fix police ext initializationVlad Buslov2-1/+2
commit 396d7f23adf9e8c436dd81a69488b5b6a865acf8 upstream. When police action is created by cls API tcf_exts_validate() first conditional that calls tcf_action_init_1() directly, the action idr is not updated according to latest changes in action API that require caller to commit newly created action to idr with tcf_idr_insert_many(). This results such action not being accessible through act API and causes crash reported by syzbot: ================================================================== BUG: KASAN: null-ptr-deref in instrument_atomic_read include/linux/instrumented.h:71 [inline] BUG: KASAN: null-ptr-deref in atomic_read include/asm-generic/atomic-instrumented.h:27 [inline] BUG: KASAN: null-ptr-deref in __tcf_idr_release net/sched/act_api.c:178 [inline] BUG: KASAN: null-ptr-deref in tcf_idrinfo_destroy+0x129/0x1d0 net/sched/act_api.c:598 Read of size 4 at addr 0000000000000010 by task kworker/u4:5/204 CPU: 0 PID: 204 Comm: kworker/u4:5 Not tainted 5.11.0-rc7-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: netns cleanup_net Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x107/0x163 lib/dump_stack.c:120 __kasan_report mm/kasan/report.c:400 [inline] kasan_report.cold+0x5f/0xd5 mm/kasan/report.c:413 check_memory_region_inline mm/kasan/generic.c:179 [inline] check_memory_region+0x13d/0x180 mm/kasan/generic.c:185 instrument_atomic_read include/linux/instrumented.h:71 [inline] atomic_read include/asm-generic/atomic-instrumented.h:27 [inline] __tcf_idr_release net/sched/act_api.c:178 [inline] tcf_idrinfo_destroy+0x129/0x1d0 net/sched/act_api.c:598 tc_action_net_exit include/net/act_api.h:151 [inline] police_exit_net+0x168/0x360 net/sched/act_police.c:390 ops_exit_list+0x10d/0x160 net/core/net_namespace.c:190 cleanup_net+0x4ea/0xb10 net/core/net_namespace.c:604 process_one_work+0x98d/0x15f0 kernel/workqueue.c:2275 worker_thread+0x64c/0x1120 kernel/workqueue.c:2421 kthread+0x3b1/0x4a0 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296 ================================================================== Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 204 Comm: kworker/u4:5 Tainted: G B 5.11.0-rc7-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: netns cleanup_net Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x107/0x163 lib/dump_stack.c:120 panic+0x306/0x73d kernel/panic.c:231 end_report+0x58/0x5e mm/kasan/report.c:100 __kasan_report mm/kasan/report.c:403 [inline] kasan_report.cold+0x67/0xd5 mm/kasan/report.c:413 check_memory_region_inline mm/kasan/generic.c:179 [inline] check_memory_region+0x13d/0x180 mm/kasan/generic.c:185 instrument_atomic_read include/linux/instrumented.h:71 [inline] atomic_read include/asm-generic/atomic-instrumented.h:27 [inline] __tcf_idr_release net/sched/act_api.c:178 [inline] tcf_idrinfo_destroy+0x129/0x1d0 net/sched/act_api.c:598 tc_action_net_exit include/net/act_api.h:151 [inline] police_exit_net+0x168/0x360 net/sched/act_police.c:390 ops_exit_list+0x10d/0x160 net/core/net_namespace.c:190 cleanup_net+0x4ea/0xb10 net/core/net_namespace.c:604 process_one_work+0x98d/0x15f0 kernel/workqueue.c:2275 worker_thread+0x64c/0x1120 kernel/workqueue.c:2421 kthread+0x3b1/0x4a0 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296 Kernel Offset: disabled Fix the issue by calling tcf_idr_insert_many() after successful action initialization. Fixes: 0fedc63fadf0 ("net_sched: commit action insertions together") Reported-by: syzbot+151e3e714d34ae4ce7e8@syzkaller.appspotmail.com Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27net_sched: reject silly cell_log in qdisc_get_rtab()Eric Dumazet1-1/+2
commit e4bedf48aaa5552bc1f49703abd17606e7e6e82a upstream. iproute2 probably never goes beyond 8 for the cell exponent, but stick to the max shift exponent for signed 32bit. UBSAN reported: UBSAN: shift-out-of-bounds in net/sched/sch_api.c:389:22 shift exponent 130 is too large for 32-bit type 'int' CPU: 1 PID: 8450 Comm: syz-executor586 Not tainted 5.11.0-rc3-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x183/0x22e lib/dump_stack.c:120 ubsan_epilogue lib/ubsan.c:148 [inline] __ubsan_handle_shift_out_of_bounds+0x432/0x4d0 lib/ubsan.c:395 __detect_linklayer+0x2a9/0x330 net/sched/sch_api.c:389 qdisc_get_rtab+0x2b5/0x410 net/sched/sch_api.c:435 cbq_init+0x28f/0x12c0 net/sched/sch_cbq.c:1180 qdisc_create+0x801/0x1470 net/sched/sch_api.c:1246 tc_modify_qdisc+0x9e3/0x1fc0 net/sched/sch_api.c:1662 rtnetlink_rcv_msg+0xb1d/0xe60 net/core/rtnetlink.c:5564 netlink_rcv_skb+0x1f0/0x460 net/netlink/af_netlink.c:2494 netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline] netlink_unicast+0x7de/0x9b0 net/netlink/af_netlink.c:1330 netlink_sendmsg+0xaa6/0xe90 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg net/socket.c:672 [inline] ____sys_sendmsg+0x5a2/0x900 net/socket.c:2345 ___sys_sendmsg net/socket.c:2399 [inline] __sys_sendmsg+0x319/0x400 net/socket.c:2432 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Cong Wang <cong.wang@bytedance.com> Link: https://lore.kernel.org/r/20210114160637.1660597-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27net_sched: avoid shift-out-of-bounds in tcindex_set_parms()Eric Dumazet1-2/+6
commit bcd0cf19ef8258ac31b9a20248b05c15a1f4b4b0 upstream. tc_index being 16bit wide, we need to check that TCA_TCINDEX_SHIFT attribute is not silly. UBSAN: shift-out-of-bounds in net/sched/cls_tcindex.c:260:29 shift exponent 255 is too large for 32-bit type 'int' CPU: 0 PID: 8516 Comm: syz-executor228 Not tainted 5.10.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x107/0x163 lib/dump_stack.c:120 ubsan_epilogue+0xb/0x5a lib/ubsan.c:148 __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395 valid_perfect_hash net/sched/cls_tcindex.c:260 [inline] tcindex_set_parms.cold+0x1b/0x215 net/sched/cls_tcindex.c:425 tcindex_change+0x232/0x340 net/sched/cls_tcindex.c:546 tc_new_tfilter+0x13fb/0x21b0 net/sched/cls_api.c:2127 rtnetlink_rcv_msg+0x8b6/0xb80 net/core/rtnetlink.c:5555 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494 netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1330 netlink_sendmsg+0x907/0xe40 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:672 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2336 ___sys_sendmsg+0xf3/0x170 net/socket.c:2390 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2423 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Link: https://lore.kernel.org/r/20210114185229.1742255-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-12net/sched: sch_taprio: ensure to reset/destroy all child qdiscsDavide Caratti1-1/+1
[ Upstream commit 698285da79f5b0b099db15a37ac661ac408c80eb ] taprio_graft() can insert a NULL element in the array of child qdiscs. As a consquence, taprio_reset() might not reset child qdiscs completely, and taprio_destroy() might leak resources. Fix it by ensuring that loops that iterate over q->qdiscs[] don't end when they find the first NULL item. Fixes: 44d4775ca518 ("net/sched: sch_taprio: reset child qdiscs before freeing them") Fixes: 5a781ccbd19e ("tc: Add support for configuring the taprio scheduler") Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Link: https://lore.kernel.org/r/13edef6778fef03adc751582562fba4a13e06d6a.1608240532.git.dcaratti@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-12net: sched: prevent invalid Scell_log shift countRandy Dunlap4-4/+4
[ Upstream commit bd1248f1ddbc48b0c30565fce897a3b6423313b8 ] Check Scell_log shift size in red_check_params() and modify all callers of red_check_params() to pass Scell_log. This prevents a shift out-of-bounds as detected by UBSAN: UBSAN: shift-out-of-bounds in ./include/net/red.h:252:22 shift exponent 72 is too large for 32-bit type 'int' Fixes: 8afa10cbe281 ("net_sched: red: Avoid illegal values") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: syzbot+97c5bd9cc81eca63d36e@syzkaller.appspotmail.com Cc: Nogah Frankel <nogahf@mellanox.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jiri Pirko <jiri@resnulli.us> Cc: netdev@vger.kernel.org Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-06net/sched: sch_taprio: reset child qdiscs before freeing themDavide Caratti1-1/+16
[ Upstream commit 44d4775ca51805b376a8db5b34f650434a08e556 ] syzkaller shows that packets can still be dequeued while taprio_destroy() is running. Let sch_taprio use the reset() function to cancel the advance timer and drop all skbs from the child qdiscs. Fixes: 5a781ccbd19e ("tc: Add support for configuring the taprio scheduler") Link: https://syzkaller.appspot.com/bug?id=f362872379bf8f0017fb667c1ab158f2d1e764ae Reported-by: syzbot+8971da381fb5a31f542d@syzkaller.appspotmail.com Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Link: https://lore.kernel.org/r/63b6d79b0e830ebb0283e020db4df3cdfdfb2b94.1608142843.git.dcaratti@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-08net/sched: act_mpls: ensure LSE is pullable before reading itDavide Caratti1-0/+3
[ Upstream commit 9608fa653059c3f72faab0c148ac8773c46e7314 ] when 'act_mpls' is used to mangle the LSE, the current value is read from the packet dereferencing 4 bytes at mpls_hdr(): ensure that the label is contained in the skb "linear" area. Found by code inspection. v2: - use MPLS_HLEN instead of sizeof(new_lse), thanks to Jakub Kicinski Fixes: 2a2ea50870ba ("net: sched: add mpls manipulation actions to TC") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Guillaume Nault <gnault@redhat.com> Link: https://lore.kernel.org/r/3243506cba43d14858f3bd21ee0994160e44d64a.1606987058.git.dcaratti@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-18net: sch_generic: fix the missing new qdisc assignment bugYunsheng Lin1-0/+3
When commit 2fb541c862c9 ("net: sch_generic: aviod concurrent reset and enqueue op for lockless qdisc") is backported to stable kernel, one assignment is missing, which causes two problems reported by Joakim and Vishwanath, see [1] and [2]. So add the assignment back to fix it. 1. https://www.spinics.net/lists/netdev/msg693916.html 2. https://www.spinics.net/lists/netdev/msg695131.html Fixes: 749cc0b0c7f3 ("net: sch_generic: aviod concurrent reset and enqueue op for lockless qdisc") Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Tested-by: Brian Norris <briannorris@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01net/sched: act_mpls: Add softdep on mpls_gso.koGuillaume Nault1-0/+1
TCA_MPLS_ACT_PUSH and TCA_MPLS_ACT_MAC_PUSH might be used on gso packets. Such packets will thus require mpls_gso.ko for segmentation. v2: Drop dependency on CONFIG_NET_MPLS_GSO in Kconfig (from Jakub and David). Fixes: 2a2ea50870ba ("net: sched: add mpls manipulation actions to TC") Signed-off-by: Guillaume Nault <gnault@redhat.com> Link: https://lore.kernel.org/r/1f6cab15bbd15666795061c55563aaf6a386e90e.1603708007.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01netem: fix zero division in tabledistAleksandr Nogikh1-1/+8
[ Upstream commit eadd1befdd778a1eca57fad058782bd22b4db804 ] Currently it is possible to craft a special netlink RTM_NEWQDISC command that can result in jitter being equal to 0x80000000. It is enough to set the 32 bit jitter to 0x02000000 (it will later be multiplied by 2^6) or just set the 64 bit jitter via TCA_NETEM_JITTER64. This causes an overflow during the generation of uniformly distributed numbers in tabledist(), which in turn leads to division by zero (sigma != 0, but sigma * 2 is 0). The related fragment of code needs 32-bit division - see commit 9b0ed89 ("netem: remove unnecessary 64 bit modulus"), so switching to 64 bit is not an option. Fix the issue by keeping the value of jitter within the range that can be adequately handled by tabledist() - [0;INT_MAX]. As negative std deviation makes no sense, take the absolute value of the passed value and cap it at INT_MAX. Inside tabledist(), switch to unsigned 32 bit arithmetic in order to prevent overflows. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Aleksandr Nogikh <nogikh@google.com> Reported-by: syzbot+ec762a6342ad0d3c0d8f@syzkaller.appspotmail.com Acked-by: Stephen Hemminger <stephen@networkplumber.org> Link: https://lore.kernel.org/r/20201028170731.1383332-1-aleksandrnogikh@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-29net/sched: act_tunnel_key: fix OOB write in case of IPv6 ERSPAN tunnelsDavide Caratti1-1/+1
[ Upstream commit a7a12b5a0f950bc6b9f7153390634ea798738db9 ] the following command # tc action add action tunnel_key \ > set src_ip 2001:db8::1 dst_ip 2001:db8::2 id 10 erspan_opts 1:6789:0:0 generates the following splat: BUG: KASAN: slab-out-of-bounds in tunnel_key_copy_opts+0xcc9/0x1010 [act_tunnel_key] Write of size 4 at addr ffff88813f5f1cc8 by task tc/873 CPU: 2 PID: 873 Comm: tc Not tainted 5.9.0+ #282 Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014 Call Trace: dump_stack+0x99/0xcb print_address_description.constprop.7+0x1e/0x230 kasan_report.cold.13+0x37/0x7c tunnel_key_copy_opts+0xcc9/0x1010 [act_tunnel_key] tunnel_key_init+0x160c/0x1f40 [act_tunnel_key] tcf_action_init_1+0x5b5/0x850 tcf_action_init+0x15d/0x370 tcf_action_add+0xd9/0x2f0 tc_ctl_action+0x29b/0x3a0 rtnetlink_rcv_msg+0x341/0x8d0 netlink_rcv_skb+0x120/0x380 netlink_unicast+0x439/0x630 netlink_sendmsg+0x719/0xbf0 sock_sendmsg+0xe2/0x110 ____sys_sendmsg+0x5ba/0x890 ___sys_sendmsg+0xe9/0x160 __sys_sendmsg+0xd3/0x170 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f872a96b338 Code: 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 25 43 2c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 41 89 d4 55 RSP: 002b:00007ffffe367518 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 000000005f8f5aed RCX: 00007f872a96b338 RDX: 0000000000000000 RSI: 00007ffffe367580 RDI: 0000000000000003 RBP: 0000000000000000 R08: 0000000000000001 R09: 000000000000001c R10: 000000000000000b R11: 0000000000000246 R12: 0000000000000001 R13: 0000000000686760 R14: 0000000000000601 R15: 0000000000000000 Allocated by task 873: kasan_save_stack+0x19/0x40 __kasan_kmalloc.constprop.7+0xc1/0xd0 __kmalloc+0x151/0x310 metadata_dst_alloc+0x20/0x40 tunnel_key_init+0xfff/0x1f40 [act_tunnel_key] tcf_action_init_1+0x5b5/0x850 tcf_action_init+0x15d/0x370 tcf_action_add+0xd9/0x2f0 tc_ctl_action+0x29b/0x3a0 rtnetlink_rcv_msg+0x341/0x8d0 netlink_rcv_skb+0x120/0x380 netlink_unicast+0x439/0x630 netlink_sendmsg+0x719/0xbf0 sock_sendmsg+0xe2/0x110 ____sys_sendmsg+0x5ba/0x890 ___sys_sendmsg+0xe9/0x160 __sys_sendmsg+0xd3/0x170 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The buggy address belongs to the object at ffff88813f5f1c00 which belongs to the cache kmalloc-256 of size 256 The buggy address is located 200 bytes inside of 256-byte region [ffff88813f5f1c00, ffff88813f5f1d00) The buggy address belongs to the page: page:0000000011b48a19 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x13f5f0 head:0000000011b48a19 order:1 compound_mapcount:0 flags: 0x17ffffc0010200(slab|head) raw: 0017ffffc0010200 0000000000000000 0000000d00000001 ffff888107c43400 raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88813f5f1b80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88813f5f1c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff88813f5f1c80: 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc ^ ffff88813f5f1d00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88813f5f1d80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc using IPv6 tunnels, act_tunnel_key allocates a fixed amount of memory for the tunnel metadata, but then it expects additional bytes to store tunnel specific metadata with tunnel_key_copy_opts(). Fix the arguments of __ipv6_tun_set_dst(), so that 'md_size' contains the size previously computed by tunnel_key_get_opts_len(), like it's done for IPv4 tunnels. Fixes: 0ed5269f9e41 ("net/sched: add tunnel option support to act_tunnel_key") Reported-by: Shuang Li <shuali@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Link: https://lore.kernel.org/r/36ebe969f6d13ff59912d6464a4356fe6f103766.1603231100.git.dcaratti@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-29net_sched: remove a redundant goto chain checkCong Wang1-14/+0
[ Upstream commit 1aad8049909a6d3379175ef2824a68ac35c0b564 ] All TC actions call tcf_action_check_ctrlact() to validate goto chain, so this check in tcf_action_init_1() is actually redundant. Remove it to save troubles of leaking memory. Fixes: e49d8c22f126 ("net_sched: defer tcf_idr_insert() in tcf_action_init_1()") Reported-by: Vlad Buslov <vladbu@mellanox.com> Suggested-by: Davide Caratti <dcaratti@redhat.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-14net_sched: commit action insertions togetherCong Wang1-9/+23
commit 0fedc63fadf0404a729e73a35349481c8009c02f upstream. syzbot is able to trigger a failure case inside the loop in tcf_action_init(), and when this happens we clean up with tcf_action_destroy(). But, as these actions are already inserted into the global IDR, other parallel process could free them before tcf_action_destroy(), then we will trigger a use-after-free. Fix this by deferring the insertions even later, after the loop, and committing all the insertions in a separate loop, so we will never fail in the middle of the insertions any more. One side effect is that the window between alloction and final insertion becomes larger, now it is more likely that the loop in tcf_del_walker() sees the placeholder -EBUSY pointer. So we have to check for error pointer in tcf_del_walker(). Reported-and-tested-by: syzbot+2287853d392e4b42374a@syzkaller.appspotmail.com Fixes: 0190c1d452a9 ("net: sched: atomically check-allocate action") Cc: Vlad Buslov <vladbu@mellanox.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-14net_sched: defer tcf_idr_insert() in tcf_action_init_1()Cong Wang20-61/+21
commit e49d8c22f1261c43a986a7fdbf677ac309682a07 upstream. All TC actions call tcf_idr_insert() for new action at the end of their ->init(), so we can actually move it to a central place in tcf_action_init_1(). And once the action is inserted into the global IDR, other parallel process could free it immediately as its refcnt is still 1, so we can not fail after this, we need to move it after the goto action validation to avoid handling the failure case after insertion. This is found during code review, is not directly triggered by syzbot. And this prepares for the next patch. Cc: Vlad Buslov <vladbu@mellanox.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-26taprio: Fix allowing too small intervalsVinicius Costa Gomes1-11/+17
[ Upstream commit b5b73b26b3ca34574124ed7ae9c5ba8391a7f176 ] It's possible that the user specifies an interval that couldn't allow any packet to be transmitted. This also avoids the issue of the hrtimer handler starving the other threads because it's running too often. The solution is to reject interval sizes that according to the current link speed wouldn't allow any packet to be transmitted. Reported-by: syzbot+8267241609ae8c23b248@syzkaller.appspotmail.com Fixes: 5a781ccbd19e ("tc: Add support for configuring the taprio scheduler") Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-26net: sch_generic: aviod concurrent reset and enqueue op for lockless qdiscYunsheng Lin1-16/+33
[ Upstream commit 2fb541c862c987d02dfdf28f1545016deecfa0d5 ] Currently there is concurrent reset and enqueue operation for the same lockless qdisc when there is no lock to synchronize the q->enqueue() in __dev_xmit_skb() with the qdisc reset operation in qdisc_deactivate() called by dev_deactivate_queue(), which may cause out-of-bounds access for priv->ring[] in hns3 driver if user has requested a smaller queue num when __dev_xmit_skb() still enqueue a skb with a larger queue_mapping after the corresponding qdisc is reset, and call hns3_nic_net_xmit() with that skb later. Reused the existing synchronize_net() in dev_deactivate_many() to make sure skb with larger queue_mapping enqueued to old qdisc(which is saved in dev_queue->qdisc_sleeping) will always be reset when dev_reset_queue() is called. Fixes: 6b3ba9146fe6 ("net: sched: allow qdiscs to handle locking") Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-26act_ife: load meta modules before tcf_idr_check_alloc()Cong Wang1-10/+34
[ Upstream commit cc8e58f8325cdf14b9516b61c384cdfd02a4f408 ] The following deadlock scenario is triggered by syzbot: Thread A: Thread B: tcf_idr_check_alloc() ... populate_metalist() rtnl_unlock() rtnl_lock() ... request_module() tcf_idr_check_alloc() rtnl_lock() At this point, thread A is waiting for thread B to release RTNL lock, while thread B is waiting for thread A to commit the IDR change with tcf_idr_insert() later. Break this deadlock situation by preloading ife modules earlier, before tcf_idr_check_alloc(), this is fine because we only need to load modules we need potentially. Reported-and-tested-by: syzbot+80e32b5d1f9923f8ace6@syzkaller.appspotmail.com Fixes: 0190c1d452a9 ("net: sched: atomically check-allocate action") Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Vlad Buslov <vladbu@mellanox.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-12taprio: Fix using wrong queues in gate maskVinicius Costa Gomes1-6/+24
[ Upstream commit 09e31cf0c528dac3358a081dc4e773d1b3de1bc9 ] Since commit 9c66d1564676 ("taprio: Add support for hardware offloading") there's a bit of inconsistency when offloading schedules to the hardware: In software mode, the gate masks are specified in terms of traffic classes, so if say "sched-entry S 03 20000", it means that the traffic classes 0 and 1 are open for 20us; when taprio is offloaded to hardware, the gate masks are specified in terms of hardware queues. The idea here is to fix hardware offloading, so schedules in hardware and software mode have the same behavior. What's needed to do is to map traffic classes to queues when applying the offload to the driver. Fixes: 9c66d1564676 ("taprio: Add support for hardware offloading") Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-03net/sched: act_ct: Fix skb double-free in tcf_ct_handle_fragments() error flowAlaa Hleihel1-1/+1
[ Upstream commit eda814b97dfb8d9f4808eb2f65af9bd3705c4cae ] tcf_ct_handle_fragments() shouldn't free the skb when ip_defrag() call fails. Otherwise, we will cause a double-free bug. In such cases, just return the error to the caller. Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct") Signed-off-by: Alaa Hleihel <alaa@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-22sched: consistently handle layer3 header accesses in the presence of VLANsToke Høiland-Jørgensen15-29/+34
[ Upstream commit d7bf2ebebc2bd61ab95e2a8e33541ef282f303d4 ] There are a couple of places in net/sched/ that check skb->protocol and act on the value there. However, in the presence of VLAN tags, the value stored in skb->protocol can be inconsistent based on whether VLAN acceleration is enabled. The commit quoted in the Fixes tag below fixed the users of skb->protocol to use a helper that will always see the VLAN ethertype. However, most of the callers don't actually handle the VLAN ethertype, but expect to find the IP header type in the protocol field. This means that things like changing the ECN field, or parsing diffserv values, stops working if there's a VLAN tag, or if there are multiple nested VLAN tags (QinQ). To fix this, change the helper to take an argument that indicates whether the caller wants to skip the VLAN tags or not. When skipping VLAN tags, we make sure to skip all of them, so behaviour is consistent even in QinQ mode. To make the helper usable from the ECN code, move it to if_vlan.h instead of pkt_sched.h. v3: - Remove empty lines - Move vlan variable definitions inside loop in skb_protocol() - Also use skb_protocol() helper in IP{,6}_ECN_decapsulate() and bpf_skb_ecn_set_ce() v2: - Use eth_type_vlan() helper in skb_protocol() - Also fix code that reads skb->protocol directly - Change a couple of 'if/else if' statements to switch constructs to avoid calling the helper twice Reported-by: Ilya Ponetayev <i.ponetaev@ndmsystems.com> Fixes: d8b9605d2697 ("net: sched: fix skb->protocol use in case of accelerated vlan path") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-22net_sched: fix a memory leak in atm_tc_init()Cong Wang1-4/+4
[ Upstream commit 306381aec7c2b5a658eebca008c8a1b666536cba ] When tcf_block_get() fails inside atm_tc_init(), atm_tc_put() is called to release the qdisc p->link.q. But the flow->ref prevents it to do so, as the flow->ref is still zero. Fix this by moving the p->link.ref initialization before tcf_block_get(). Fixes: 6529eaba33f0 ("net: sched: introduce tcf block infractructure") Reported-and-tested-by: syzbot+d411cff6ab29cc2c311b@syzkaller.appspotmail.com Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-30net: sched: export __netdev_watchdog_up()Valentin Longchamp1-0/+1
[ Upstream commit 1a3db27ad9a72d033235b9673653962c02e3486e ] Since the quiesce/activate rework, __netdev_watchdog_up() is directly called in the ucc_geth driver. Unfortunately, this function is not available for modules and thus ucc_geth cannot be built as a module anymore. Fix it by exporting __netdev_watchdog_up(). Since the commit introducing the regression was backported to stable branches, this one should ideally be as well. Fixes: 79dde73cf9bc ("net/ethernet/freescale: rework quiesce/activate for ucc_geth") Signed-off-by: Valentin Longchamp <valentin@longchamp.me> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-30sch_cake: fix a few style nitsToke Høiland-Jørgensen1-2/+2
[ Upstream commit 3f608f0c41360b11b04c763f348b712f651c8bac ] I spotted a few nits when comparing the in-tree version of sch_cake with the out-of-tree one: A redundant error variable declaration shadowing an outer declaration, and an indentation alignment issue. Fix both of these. Fixes: 046f6fd5daef ("sched: Add Common Applications Kept Enhanced (cake) qdisc") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-30sch_cake: don't call diffserv parsing code when it is not neededToke Høiland-Jørgensen1-4/+9
[ Upstream commit 8c95eca0bb8c4bd2231a0d581f1ad0d50c90488c ] As a further optimisation of the diffserv parsing codepath, we can skip it entirely if CAKE is configured to neither use diffserv-based classification, nor to zero out the diffserv bits. Fixes: c87b4ecdbe8d ("sch_cake: Make sure we can write the IP header before changing DSCP bits") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-30sch_cake: don't try to reallocate or unshare skb unconditionallyIlya Ponetayev1-11/+30
[ Upstream commit 9208d2863ac689a563b92f2161d8d1e7127d0add ] cake_handle_diffserv() tries to linearize mac and network header parts of skb and to make it writable unconditionally. In some cases it leads to full skb reallocation, which reduces throughput and increases CPU load. Some measurements of IPv4 forward + NAPT on MIPS router with 580 MHz single-core CPU was conducted. It appears that on kernel 4.9 skb_try_make_writable() reallocates skb, if skb was allocated in ethernet driver via so-called 'build skb' method from page cache (it was discovered by strange increase of kmalloc-2048 slab at first). Obtain DSCP value via read-only skb_header_pointer() call, and leave linearization only for DSCP bleaching or ECN CE setting. And, as an additional optimisation, skip diffserv parsing entirely if it is not needed by the current configuration. Fixes: c87b4ecdbe8d ("sch_cake: Make sure we can write the IP header before changing DSCP bits") Signed-off-by: Ilya Ponetayev <i.ponetaev@ndmsystems.com> [ fix a few style issues, reflow commit message ] Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-20net_sched: fix tcm_parent in tc filter dumpCong Wang1-4/+4
[ Upstream commit a7df4870d79b00742da6cc93ca2f336a71db77f7 ] When we tell kernel to dump filters from root (ffff:ffff), those filters on ingress (ffff:0000) are matched, but their true parents must be dumped as they are. However, kernel dumps just whatever we tell it, that is either ffff:ffff or ffff:0000: $ nl-cls-list --dev=dummy0 --parent=root cls basic dev dummy0 id none parent root prio 49152 protocol ip match-all cls basic dev dummy0 id :1 parent root prio 49152 protocol ip match-all $ nl-cls-list --dev=dummy0 --parent=ffff: cls basic dev dummy0 id none parent ffff: prio 49152 protocol ip match-all cls basic dev dummy0 id :1 parent ffff: prio 49152 protocol ip match-all This is confusing and misleading, more importantly this is a regression since 4.15, so the old behavior must be restored. And, when tc filters are installed on a tc class, the parent should be the classid, rather than the qdisc handle. Commit edf6711c9840 ("net: sched: remove classid and q fields from tcf_proto") removed the classid we save for filters, we can just restore this classid in tcf_block. Steps to reproduce this: ip li set dev dummy0 up tc qd add dev dummy0 ingress tc filter add dev dummy0 parent ffff: protocol arp basic action pass tc filter show dev dummy0 root Before this patch: filter protocol arp pref 49152 basic filter protocol arp pref 49152 basic handle 0x1 action order 1: gact action pass random type none pass val 0 index 1 ref 1 bind 1 After this patch: filter parent ffff: protocol arp pref 49152 basic filter parent ffff: protocol arp pref 49152 basic handle 0x1 action order 1: gact action pass random type none pass val 0 index 1 ref 1 bind 1 Fixes: a10fa20101ae ("net: sched: propagate q and parent from caller down to tcf_fill_node") Fixes: edf6711c9840 ("net: sched: remove classid and q fields from tcf_proto") Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-14sch_sfq: validate silly quantum valuesEric Dumazet1-0/+9
[ Upstream commit df4953e4e997e273501339f607b77953772e3559 ] syzbot managed to set up sfq so that q->scaled_quantum was zero, triggering an infinite loop in sfq_dequeue() More generally, we must only accept quantum between 1 and 2^18 - 7, meaning scaled_quantum must be in [1, 0x7FFF] range. Otherwise, we also could have a loop in sfq_dequeue() if scaled_quantum happens to be 0x8000, since slot->allot could indefinitely switch between 0 and 0x8000. Fixes: eeaeb068f139 ("sch_sfq: allow big packets and be fair") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot+0251e883fe39e7a0cb0a@syzkaller.appspotmail.com Cc: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-14sch_choke: avoid potential panic in choke_reset()Eric Dumazet1-1/+2
[ Upstream commit 8738c85c72b3108c9b9a369a39868ba5f8e10ae0 ] If choke_init() could not allocate q->tab, we would crash later in choke_reset(). BUG: KASAN: null-ptr-deref in memset include/linux/string.h:366 [inline] BUG: KASAN: null-ptr-deref in choke_reset+0x208/0x340 net/sched/sch_choke.c:326 Write of size 8 at addr 0000000000000000 by task syz-executor822/7022 CPU: 1 PID: 7022 Comm: syz-executor822 Not tainted 5.7.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x188/0x20d lib/dump_stack.c:118 __kasan_report.cold+0x5/0x4d mm/kasan/report.c:515 kasan_report+0x33/0x50 mm/kasan/common.c:625 check_memory_region_inline mm/kasan/generic.c:187 [inline] check_memory_region+0x141/0x190 mm/kasan/generic.c:193 memset+0x20/0x40 mm/kasan/common.c:85 memset include/linux/string.h:366 [inline] choke_reset+0x208/0x340 net/sched/sch_choke.c:326 qdisc_reset+0x6b/0x520 net/sched/sch_generic.c:910 dev_deactivate_queue.constprop.0+0x13c/0x240 net/sched/sch_generic.c:1138 netdev_for_each_tx_queue include/linux/netdevice.h:2197 [inline] dev_deactivate_many+0xe2/0xba0 net/sched/sch_generic.c:1195 dev_deactivate+0xf8/0x1c0 net/sched/sch_generic.c:1233 qdisc_graft+0xd25/0x1120 net/sched/sch_api.c:1051 tc_modify_qdisc+0xbab/0x1a00 net/sched/sch_api.c:1670 rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5454 netlink_rcv_skb+0x15a/0x410 net/netlink/af_netlink.c:2469 netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline] netlink_unicast+0x537/0x740 net/netlink/af_netlink.c:1329 netlink_sendmsg+0x882/0xe10 net/netlink/af_netlink.c:1918 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:672 ____sys_sendmsg+0x6bf/0x7e0 net/socket.c:2362 ___sys_sendmsg+0x100/0x170 net/socket.c:2416 __sys_sendmsg+0xec/0x1b0 net/socket.c:2449 do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295 Fixes: 77e62da6e60c ("sch_choke: drop all packets in queue during reset") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-14net_sched: sch_skbprio: add message validation to skbprio_change()Eric Dumazet1-0/+3
[ Upstream commit 2761121af87de45951989a0adada917837d8fa82 ] Do not assume the attribute has the right size. Fixes: aea5f654e6b7 ("net/sched: add skbprio scheduler") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-14fq_codel: fix TCA_FQ_CODEL_DROP_BATCH_SIZE sanity checksEric Dumazet1-1/+1
[ Upstream commit 14695212d4cd8b0c997f6121b6df8520038ce076 ] My intent was to not let users set a zero drop_batch_size, it seems I once again messed with min()/max(). Fixes: 9d18562a2278 ("fq_codel: add batch ability to fq_codel_drop()") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-29sched: etf: do not assume all sockets are full blownEric Dumazet1-3/+4
[ Upstream commit a1211bf9a7774706722ba3b18c6157d980319f79 ] skb->sk does not always point to a full blown socket, we need to use sk_fullsock() before accessing fields which only make sense on full socket. BUG: KASAN: use-after-free in report_sock_error+0x286/0x300 net/sched/sch_etf.c:141 Read of size 1 at addr ffff88805eb9b245 by task syz-executor.5/9630 CPU: 1 PID: 9630 Comm: syz-executor.5 Not tainted 5.7.0-rc2-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x188/0x20d lib/dump_stack.c:118 print_address_description.constprop.0.cold+0xd3/0x315 mm/kasan/report.c:382 __kasan_report.cold+0x35/0x4d mm/kasan/report.c:511 kasan_report+0x33/0x50 mm/kasan/common.c:625 report_sock_error+0x286/0x300 net/sched/sch_etf.c:141 etf_enqueue_timesortedlist+0x389/0x740 net/sched/sch_etf.c:170 __dev_xmit_skb net/core/dev.c:3710 [inline] __dev_queue_xmit+0x154a/0x30a0 net/core/dev.c:4021 neigh_hh_output include/net/neighbour.h:499 [inline] neigh_output include/net/neighbour.h:508 [inline] ip6_finish_output2+0xfb5/0x25b0 net/ipv6/ip6_output.c:117 __ip6_finish_output+0x442/0xab0 net/ipv6/ip6_output.c:143 ip6_finish_output+0x34/0x1f0 net/ipv6/ip6_output.c:153 NF_HOOK_COND include/linux/netfilter.h:296 [inline] ip6_output+0x239/0x810 net/ipv6/ip6_output.c:176 dst_output include/net/dst.h:435 [inline] NF_HOOK include/linux/netfilter.h:307 [inline] NF_HOOK include/linux/netfilter.h:301 [inline] ip6_xmit+0xe1a/0x2090 net/ipv6/ip6_output.c:280 tcp_v6_send_synack+0x4e7/0x960 net/ipv6/tcp_ipv6.c:521 tcp_rtx_synack+0x10d/0x1a0 net/ipv4/tcp_output.c:3916 inet_rtx_syn_ack net/ipv4/inet_connection_sock.c:669 [inline] reqsk_timer_handler+0x4c2/0xb40 net/ipv4/inet_connection_sock.c:763 call_timer_fn+0x1ac/0x780 kernel/time/timer.c:1405 expire_timers kernel/time/timer.c:1450 [inline] __run_timers kernel/time/timer.c:1774 [inline] __run_timers kernel/time/timer.c:1741 [inline] run_timer_softirq+0x623/0x1600 kernel/time/timer.c:1787 __do_softirq+0x26c/0x9f7 kernel/softirq.c:292 invoke_softirq kernel/softirq.c:373 [inline] irq_exit+0x192/0x1d0 kernel/softirq.c:413 exiting_irq arch/x86/include/asm/apic.h:546 [inline] smp_apic_timer_interrupt+0x19e/0x600 arch/x86/kernel/apic/apic.c:1140 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:829 </IRQ> RIP: 0010:des_encrypt+0x157/0x9c0 lib/crypto/des.c:792 Code: 85 22 06 00 00 41 31 dc 41 8b 4d 04 44 89 e2 41 83 e4 3f 4a 8d 3c a5 60 72 72 88 81 e2 3f 3f 3f 3f 48 89 f8 48 c1 e8 03 31 d9 <0f> b6 34 28 48 89 f8 c1 c9 04 83 e0 07 83 c0 03 40 38 f0 7c 09 40 RSP: 0018:ffffc90003b5f6c0 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13 RAX: 1ffffffff10e4e55 RBX: 00000000d2f846d0 RCX: 00000000d2f846d0 RDX: 0000000012380612 RSI: ffffffff839863ca RDI: ffffffff887272a8 RBP: dffffc0000000000 R08: ffff888091d0a380 R09: 0000000000800081 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000012 R13: ffff8880a8ae8078 R14: 00000000c545c93e R15: 0000000000000006 cipher_crypt_one crypto/cipher.c:75 [inline] crypto_cipher_encrypt_one+0x124/0x210 crypto/cipher.c:82 crypto_cbcmac_digest_update+0x1b5/0x250 crypto/ccm.c:830 crypto_shash_update+0xc4/0x120 crypto/shash.c:119 shash_ahash_update+0xa3/0x110 crypto/shash.c:246 crypto_ahash_update include/crypto/hash.h:547 [inline] hash_sendmsg+0x518/0xad0 crypto/algif_hash.c:102 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:672 ____sys_sendmsg+0x308/0x7e0 net/socket.c:2362 ___sys_sendmsg+0x100/0x170 net/socket.c:2416 __sys_sendmmsg+0x195/0x480 net/socket.c:2506 __do_sys_sendmmsg net/socket.c:2535 [inline] __se_sys_sendmmsg net/socket.c:2532 [inline] __x64_sys_sendmmsg+0x99/0x100 net/socket.c:2532 do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295 entry_SYSCALL_64_after_hwframe+0x49/0xb3 RIP: 0033:0x45c829 Code: 0d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 db b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f6d9528ec78 EFLAGS: 00000246 ORIG_RAX: 0000000000000133 RAX: ffffffffffffffda RBX: 00000000004fc080 RCX: 000000000045c829 RDX: 0000000000000001 RSI: 0000000020002640 RDI: 0000000000000004 RBP: 000000000078bf00 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 00000000000008d7 R14: 00000000004cb7aa R15: 00007f6d9528f6d4 Fixes: 4b15c7075352 ("net/sched: Make etf report drops on error_queue") Fixes: 25db26a91364 ("net/sched: Introduce the ETF Qdisc") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com> Reviewed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-13net_sched: fix a missing refcnt in tcindex_init()Cong Wang1-0/+1
[ Upstream commit a8eab6d35e22f4f21471f16147be79529cd6aaf7 ] The initial refcnt of struct tcindex_data should be 1, it is clear that I forgot to set it to 1 in tcindex_init(). This leads to a dec-after-zero warning. Reported-by: syzbot+8325e509a1bf83ec741d@syzkaller.appspotmail.com Fixes: 304e024216a8 ("net_sched: add a temporary refcnt for struct tcindex_data") Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Cc: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-13net_sched: add a temporary refcnt for struct tcindex_dataCong Wang1-6/+38
[ Upstream commit 304e024216a802a7dc8ba75d36de82fa136bbf3e ] Although we intentionally use an ordered workqueue for all tc filter works, the ordering is not guaranteed by RCU work, given that tcf_queue_work() is esstenially a call_rcu(). This problem is demostrated by Thomas: CPU 0: tcf_queue_work() tcf_queue_work(&r->rwork, tcindex_destroy_rexts_work); -> Migration to CPU 1 CPU 1: tcf_queue_work(&p->rwork, tcindex_destroy_work); so the 2nd work could be queued before the 1st one, which leads to a free-after-free. Enforcing this order in RCU work is hard as it requires to change RCU code too. Fortunately we can workaround this problem in tcindex filter by taking a temporary refcnt, we only refcnt it right before we begin to destroy it. This simplifies the code a lot as a full refcnt requires much more changes in tcindex_set_parms(). Reported-by: syzbot+46f513c3033d592409d2@syzkaller.appspotmail.com Fixes: 3d210534cc93 ("net_sched: fix a race condition in tcindex_destroy()") Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-01net: Fix CONFIG_NET_CLS_ACT=n and CONFIG_NFT_FWD_NETDEV={y, m} buildPablo Neira Ayuso1-4/+2
commit 2c64605b590edadb3fb46d1ec6badb49e940b479 upstream. net/netfilter/nft_fwd_netdev.c: In function ‘nft_fwd_netdev_eval’: net/netfilter/nft_fwd_netdev.c:32:10: error: ‘struct sk_buff’ has no member named ‘tc_redirected’ pkt->skb->tc_redirected = 1; ^~ net/netfilter/nft_fwd_netdev.c:33:10: error: ‘struct sk_buff’ has no member named ‘tc_from_ingress’ pkt->skb->tc_from_ingress = 1; ^~ To avoid a direct dependency with tc actions from netfilter, wrap the redirect bits around CONFIG_NET_REDIRECT and move helpers to include/linux/skbuff.h. Turn on this toggle from the ifb driver, the only existing client of these bits in the tree. This patch adds skb_set_redirected() that sets on the redirected bit on the skbuff, it specifies if the packet was redirect from ingress and resets the timestamp (timestamp reset was originally missing in the netfilter bugfix). Fixes: bcfabee1afd99484 ("netfilter: nft_fwd_netdev: allow to redirect to ifb via ingress") Reported-by: noreply@ellerman.id.au Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>