summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-09-30net: add new socket option SO_RESERVE_MEMWei Wang5-7/+112
This socket option provides a mechanism for users to reserve a certain amount of memory for the socket to use. When this option is set, kernel charges the user specified amount of memory to memcg, as well as sk_forward_alloc. This amount of memory is not reclaimable and is available in sk_forward_alloc for this socket. With this socket option set, the networking stack spends less cycles doing forward alloc and reclaim, which should lead to better system performance, with the cost of an amount of pre-allocated and unreclaimable memory, even under memory pressure. Note: This socket option is only available when memory cgroup is enabled and we require this reserved memory to be charged to the user's memcg. We hope this could avoid mis-behaving users to abused this feature to reserve a large amount on certain sockets and cause unfairness for others. Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-30net: dev_addr_list: handle first address in __hw_addr_add_exJakub Kicinski1-0/+6
struct dev_addr_list is used for device addresses, unicast addresses and multicast addresses. The first of those needs special handling of the main address - netdev->dev_addr points directly the data of the entry and drivers write to it freely, so we can't maintain it in the rbtree (for now, at least, to be fixed in net-next). Current work around sprinkles special handling of the first address on the list throughout the code but it missed the case where address is being added. First address will not be visible during subsequent adds. Syzbot found a warning where unicast addresses are modified without holding the rtnl lock, tl;dr is that team generates the same modification multiple times, not necessarily when right locks are held. In the repro we have: macvlan -> team -> veth macvlan adds a unicast address to the team. Team then pushes that address down to its memebers (veths). Next something unrelated makes team sync member addrs again, and because of the bug the addr entries get duplicated in the veths. macvlan gets removed, removes its addr from team which removes only one of the duplicated addresses from veths. This removal is done under rtnl. Next syzbot uses iptables to add a multicast addr to team (which does not hold rtnl lock). Team syncs veth addrs, but because veths' unicast list still has the duplicate it will also get sync, even though this update is intended for mc addresses. Again, uc address updates need rtnl lock, boom. Reported-by: syzbot+7a2ab2cdc14d134de553@syzkaller.appspotmail.com Fixes: 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs with IPv6 addresses, performance of changing link state, attaching a VRF, changing an IPv6 address, etc. go down dramtically.") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-30net: phy: marvell10g: add downshift tunable supportRussell King1-1/+106
Add support for the downshift tunable for the Marvell 88x3310 PHY. Downshift is only usable with firmware 0.3.5.0 and later. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-30net: sched: flower: protect fl_walk() with rcuVlad Buslov1-0/+6
Patch that refactored fl_walk() to use idr_for_each_entry_continue_ul() also removed rcu protection of individual filters which causes following use-after-free when filter is deleted concurrently. Fix fl_walk() to obtain rcu read lock while iterating and taking the filter reference and temporary release the lock while calling arg->fn() callback that can sleep. KASAN trace: [ 352.773640] ================================================================== [ 352.775041] BUG: KASAN: use-after-free in fl_walk+0x159/0x240 [cls_flower] [ 352.776304] Read of size 4 at addr ffff8881c8251480 by task tc/2987 [ 352.777862] CPU: 3 PID: 2987 Comm: tc Not tainted 5.15.0-rc2+ #2 [ 352.778980] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 352.781022] Call Trace: [ 352.781573] dump_stack_lvl+0x46/0x5a [ 352.782332] print_address_description.constprop.0+0x1f/0x140 [ 352.783400] ? fl_walk+0x159/0x240 [cls_flower] [ 352.784292] ? fl_walk+0x159/0x240 [cls_flower] [ 352.785138] kasan_report.cold+0x83/0xdf [ 352.785851] ? fl_walk+0x159/0x240 [cls_flower] [ 352.786587] kasan_check_range+0x145/0x1a0 [ 352.787337] fl_walk+0x159/0x240 [cls_flower] [ 352.788163] ? fl_put+0x10/0x10 [cls_flower] [ 352.789007] ? __mutex_unlock_slowpath.constprop.0+0x220/0x220 [ 352.790102] tcf_chain_dump+0x231/0x450 [ 352.790878] ? tcf_chain_tp_delete_empty+0x170/0x170 [ 352.791833] ? __might_sleep+0x2e/0xc0 [ 352.792594] ? tfilter_notify+0x170/0x170 [ 352.793400] ? __mutex_unlock_slowpath.constprop.0+0x220/0x220 [ 352.794477] tc_dump_tfilter+0x385/0x4b0 [ 352.795262] ? tc_new_tfilter+0x1180/0x1180 [ 352.796103] ? __mod_node_page_state+0x1f/0xc0 [ 352.796974] ? __build_skb_around+0x10e/0x130 [ 352.797826] netlink_dump+0x2c0/0x560 [ 352.798563] ? netlink_getsockopt+0x430/0x430 [ 352.799433] ? __mutex_unlock_slowpath.constprop.0+0x220/0x220 [ 352.800542] __netlink_dump_start+0x356/0x440 [ 352.801397] rtnetlink_rcv_msg+0x3ff/0x550 [ 352.802190] ? tc_new_tfilter+0x1180/0x1180 [ 352.802872] ? rtnl_calcit.isra.0+0x1f0/0x1f0 [ 352.803668] ? tc_new_tfilter+0x1180/0x1180 [ 352.804344] ? _copy_from_iter_nocache+0x800/0x800 [ 352.805202] ? kasan_set_track+0x1c/0x30 [ 352.805900] netlink_rcv_skb+0xc6/0x1f0 [ 352.806587] ? rht_deferred_worker+0x6b0/0x6b0 [ 352.807455] ? rtnl_calcit.isra.0+0x1f0/0x1f0 [ 352.808324] ? netlink_ack+0x4d0/0x4d0 [ 352.809086] ? netlink_deliver_tap+0x62/0x3d0 [ 352.809951] netlink_unicast+0x353/0x480 [ 352.810744] ? netlink_attachskb+0x430/0x430 [ 352.811586] ? __alloc_skb+0xd7/0x200 [ 352.812349] netlink_sendmsg+0x396/0x680 [ 352.813132] ? netlink_unicast+0x480/0x480 [ 352.813952] ? __import_iovec+0x192/0x210 [ 352.814759] ? netlink_unicast+0x480/0x480 [ 352.815580] sock_sendmsg+0x6c/0x80 [ 352.816299] ____sys_sendmsg+0x3a5/0x3c0 [ 352.817096] ? kernel_sendmsg+0x30/0x30 [ 352.817873] ? __ia32_sys_recvmmsg+0x150/0x150 [ 352.818753] ___sys_sendmsg+0xd8/0x140 [ 352.819518] ? sendmsg_copy_msghdr+0x110/0x110 [ 352.820402] ? ___sys_recvmsg+0xf4/0x1a0 [ 352.821110] ? __copy_msghdr_from_user+0x260/0x260 [ 352.821934] ? _raw_spin_lock+0x81/0xd0 [ 352.822680] ? __handle_mm_fault+0xef3/0x1b20 [ 352.823549] ? rb_insert_color+0x2a/0x270 [ 352.824373] ? copy_page_range+0x16b0/0x16b0 [ 352.825209] ? perf_event_update_userpage+0x2d0/0x2d0 [ 352.826190] ? __fget_light+0xd9/0xf0 [ 352.826941] __sys_sendmsg+0xb3/0x130 [ 352.827613] ? __sys_sendmsg_sock+0x20/0x20 [ 352.828377] ? do_user_addr_fault+0x2c5/0x8a0 [ 352.829184] ? fpregs_assert_state_consistent+0x52/0x60 [ 352.830001] ? exit_to_user_mode_prepare+0x32/0x160 [ 352.830845] do_syscall_64+0x35/0x80 [ 352.831445] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 352.832331] RIP: 0033:0x7f7bee973c17 [ 352.833078] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 [ 352.836202] RSP: 002b:00007ffcbb368e28 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 352.837524] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7bee973c17 [ 352.838715] RDX: 0000000000000000 RSI: 00007ffcbb368e50 RDI: 0000000000000003 [ 352.839838] RBP: 00007ffcbb36d090 R08: 00000000cea96d79 R09: 00007f7beea34a40 [ 352.841021] R10: 00000000004059bb R11: 0000000000000246 R12: 000000000046563f [ 352.842208] R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffcbb36d088 [ 352.843784] Allocated by task 2960: [ 352.844451] kasan_save_stack+0x1b/0x40 [ 352.845173] __kasan_kmalloc+0x7c/0x90 [ 352.845873] fl_change+0x282/0x22db [cls_flower] [ 352.846696] tc_new_tfilter+0x6cf/0x1180 [ 352.847493] rtnetlink_rcv_msg+0x471/0x550 [ 352.848323] netlink_rcv_skb+0xc6/0x1f0 [ 352.849097] netlink_unicast+0x353/0x480 [ 352.849886] netlink_sendmsg+0x396/0x680 [ 352.850678] sock_sendmsg+0x6c/0x80 [ 352.851398] ____sys_sendmsg+0x3a5/0x3c0 [ 352.852202] ___sys_sendmsg+0xd8/0x140 [ 352.852967] __sys_sendmsg+0xb3/0x130 [ 352.853718] do_syscall_64+0x35/0x80 [ 352.854457] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 352.855830] Freed by task 7: [ 352.856421] kasan_save_stack+0x1b/0x40 [ 352.857139] kasan_set_track+0x1c/0x30 [ 352.857854] kasan_set_free_info+0x20/0x30 [ 352.858609] __kasan_slab_free+0xed/0x130 [ 352.859348] kfree+0xa7/0x3c0 [ 352.859951] process_one_work+0x44d/0x780 [ 352.860685] worker_thread+0x2e2/0x7e0 [ 352.861390] kthread+0x1f4/0x220 [ 352.862022] ret_from_fork+0x1f/0x30 [ 352.862955] Last potentially related work creation: [ 352.863758] kasan_save_stack+0x1b/0x40 [ 352.864378] kasan_record_aux_stack+0xab/0xc0 [ 352.865028] insert_work+0x30/0x160 [ 352.865617] __queue_work+0x351/0x670 [ 352.866261] rcu_work_rcufn+0x30/0x40 [ 352.866917] rcu_core+0x3b2/0xdb0 [ 352.867561] __do_softirq+0xf6/0x386 [ 352.868708] Second to last potentially related work creation: [ 352.869779] kasan_save_stack+0x1b/0x40 [ 352.870560] kasan_record_aux_stack+0xab/0xc0 [ 352.871426] call_rcu+0x5f/0x5c0 [ 352.872108] queue_rcu_work+0x44/0x50 [ 352.872855] __fl_put+0x17c/0x240 [cls_flower] [ 352.873733] fl_delete+0xc7/0x100 [cls_flower] [ 352.874607] tc_del_tfilter+0x510/0xb30 [ 352.886085] rtnetlink_rcv_msg+0x471/0x550 [ 352.886875] netlink_rcv_skb+0xc6/0x1f0 [ 352.887636] netlink_unicast+0x353/0x480 [ 352.888285] netlink_sendmsg+0x396/0x680 [ 352.888942] sock_sendmsg+0x6c/0x80 [ 352.889583] ____sys_sendmsg+0x3a5/0x3c0 [ 352.890311] ___sys_sendmsg+0xd8/0x140 [ 352.891019] __sys_sendmsg+0xb3/0x130 [ 352.891716] do_syscall_64+0x35/0x80 [ 352.892395] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 352.893666] The buggy address belongs to the object at ffff8881c8251000 which belongs to the cache kmalloc-2k of size 2048 [ 352.895696] The buggy address is located 1152 bytes inside of 2048-byte region [ffff8881c8251000, ffff8881c8251800) [ 352.897640] The buggy address belongs to the page: [ 352.898492] page:00000000213bac35 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1c8250 [ 352.900110] head:00000000213bac35 order:3 compound_mapcount:0 compound_pincount:0 [ 352.901541] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff) [ 352.902908] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042f00 [ 352.904391] raw: 0000000000000000 0000000000080008 00000001ffffffff 0000000000000000 [ 352.905861] page dumped because: kasan: bad access detected [ 352.907323] Memory state around the buggy address: [ 352.908218] ffff8881c8251380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 352.909471] ffff8881c8251400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 352.910735] >ffff8881c8251480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 352.912012] ^ [ 352.912642] ffff8881c8251500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 352.913919] ffff8881c8251580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 352.915185] ================================================================== Fixes: d39d714969cd ("idr: introduce idr_for_each_entry_continue_ul()") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Acked-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-30octeontx2-af: Remove redundant initialization of variable pinColin Ian King1-1/+1
The variable pin is being initialized with a value that is never read, it is being updated later on in only one case of a switch statement. The assignment is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-30net: macb: ptp: Switch to gettimex64() interfaceLars-Peter Clausen1-4/+9
The macb PTP support currently implements the `gettime64` callback to allow to retrieve the hardware clock time. Update the implementation to provide the `gettimex64` callback instead. The difference between the two is that with `gettime64` a snapshot of the system clock is taken before and after invoking the callback. Whereas `gettimex64` expects the callback itself to take the snapshots. To get the time from the macb Ethernet core multiple register accesses have to be done. Only one of which will happen at the time reported by the function. This leads to a non-symmetric delay and adds a slight offset between the hardware and system clock time when using the `gettime64` method. This offset can be a few 100 nanoseconds. Switching to the `gettimex64` method allows for a more precise correlation of the hardware and system clocks and results in a lower offset between the two. On a Xilinx ZynqMP system `phc2sys` reports a delay of 1120 ns before and 300 ns after the patch. With the latter being mostly symmetric. Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-30dissector: do not set invalid PPP protocolBoris Sukholitko1-2/+1
The following flower filter fails to match non-PPP_IP{V6} packets wrapped in PPP_SES protocol: tc filter add dev eth0 ingress protocol ppp_ses flower \ action simple sdata hi64 The reason is that proto local variable is being set even when FLOW_DISSECT_RET_OUT_BAD status is returned. The fix is to avoid setting proto variable if the PPP protocol is unknown. Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-30net: dsa: rtl8366rb: Use core filtering trackingLinus Walleij1-7/+2
We added a state variable to track whether a certain port was VLAN filtering or not, but we can just inquire the DSA core about this. Cc: Vladimir Oltean <olteanv@gmail.com> Cc: Mauri Sandberg <sandberg@mailfence.com> Cc: DENG Qingfang <dqfext@gmail.com> Cc: Alvin Šipraga <alsi@bang-olufsen.dk> Cc: Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-30net: introduce and use lock_sock_fast_nested()Paolo Abeni3-20/+33
Syzkaller reported a false positive deadlock involving the nl socket lock and the subflow socket lock: MPTCP: kernel_bind error, err=-98 ============================================ WARNING: possible recursive locking detected 5.15.0-rc1-syzkaller #0 Not tainted -------------------------------------------- syz-executor998/6520 is trying to acquire lock: ffff8880795718a0 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_close+0x267/0x7b0 net/mptcp/protocol.c:2738 but task is already holding lock: ffff8880787c8c60 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1612 [inline] ffff8880787c8c60 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_close+0x23/0x7b0 net/mptcp/protocol.c:2720 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(k-sk_lock-AF_INET); lock(k-sk_lock-AF_INET); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by syz-executor998/6520: #0: ffffffff8d176c50 (cb_lock){++++}-{3:3}, at: genl_rcv+0x15/0x40 net/netlink/genetlink.c:802 #1: ffffffff8d176d08 (genl_mutex){+.+.}-{3:3}, at: genl_lock net/netlink/genetlink.c:33 [inline] #1: ffffffff8d176d08 (genl_mutex){+.+.}-{3:3}, at: genl_rcv_msg+0x3e0/0x580 net/netlink/genetlink.c:790 #2: ffff8880787c8c60 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1612 [inline] #2: ffff8880787c8c60 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_close+0x23/0x7b0 net/mptcp/protocol.c:2720 stack backtrace: CPU: 1 PID: 6520 Comm: syz-executor998 Not tainted 5.15.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_deadlock_bug kernel/locking/lockdep.c:2944 [inline] check_deadlock kernel/locking/lockdep.c:2987 [inline] validate_chain kernel/locking/lockdep.c:3776 [inline] __lock_acquire.cold+0x149/0x3ab kernel/locking/lockdep.c:5015 lock_acquire kernel/locking/lockdep.c:5625 [inline] lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5590 lock_sock_fast+0x36/0x100 net/core/sock.c:3229 mptcp_close+0x267/0x7b0 net/mptcp/protocol.c:2738 inet_release+0x12e/0x280 net/ipv4/af_inet.c:431 __sock_release net/socket.c:649 [inline] sock_release+0x87/0x1b0 net/socket.c:677 mptcp_pm_nl_create_listen_socket+0x238/0x2c0 net/mptcp/pm_netlink.c:900 mptcp_nl_cmd_add_addr+0x359/0x930 net/mptcp/pm_netlink.c:1170 genl_family_rcv_msg_doit+0x228/0x320 net/netlink/genetlink.c:731 genl_family_rcv_msg net/netlink/genetlink.c:775 [inline] genl_rcv_msg+0x328/0x580 net/netlink/genetlink.c:792 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504 genl_rcv+0x24/0x40 net/netlink/genetlink.c:803 netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340 netlink_sendmsg+0x86d/0xdb0 net/netlink/af_netlink.c:1929 sock_sendmsg_nosec net/socket.c:704 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:724 sock_no_sendpage+0x101/0x150 net/core/sock.c:2980 kernel_sendpage.part.0+0x1a0/0x340 net/socket.c:3504 kernel_sendpage net/socket.c:3501 [inline] sock_sendpage+0xe5/0x140 net/socket.c:1003 pipe_to_sendpage+0x2ad/0x380 fs/splice.c:364 splice_from_pipe_feed fs/splice.c:418 [inline] __splice_from_pipe+0x43e/0x8a0 fs/splice.c:562 splice_from_pipe fs/splice.c:597 [inline] generic_splice_sendpage+0xd4/0x140 fs/splice.c:746 do_splice_from fs/splice.c:767 [inline] direct_splice_actor+0x110/0x180 fs/splice.c:936 splice_direct_to_actor+0x34b/0x8c0 fs/splice.c:891 do_splice_direct+0x1b3/0x280 fs/splice.c:979 do_sendfile+0xae9/0x1240 fs/read_write.c:1249 __do_sys_sendfile64 fs/read_write.c:1314 [inline] __se_sys_sendfile64 fs/read_write.c:1300 [inline] __x64_sys_sendfile64+0x1cc/0x210 fs/read_write.c:1300 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f215cb69969 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 14 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffc96bb3868 EFLAGS: 00000246 ORIG_RAX: 0000000000000028 RAX: ffffffffffffffda RBX: 00007f215cbad072 RCX: 00007f215cb69969 RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000005 RBP: 0000000000000000 R08: 00007ffc96bb3a08 R09: 00007ffc96bb3a08 R10: 0000000100000002 R11: 0000000000000246 R12: 00007ffc96bb387c R13: 431bde82d7b634db R14: 0000000000000000 R15: 0000000000000000 the problem originates from uncorrect lock annotation in the mptcp code and is only visible since commit 2dcb96bacce3 ("net: core: Correct the sock::sk_lock.owned lockdep annotations"), but is present since the port-based endpoint support initial implementation. This patch addresses the issue introducing a nested variant of lock_sock_fast() and using it in the relevant code path. Fixes: 1729cf186d8a ("mptcp: create the listening socket for new port") Fixes: 2dcb96bacce3 ("net: core: Correct the sock::sk_lock.owned lockdep annotations") Suggested-by: Thomas Gleixner <tglx@linutronix.de> Reported-and-tested-by: syzbot+1dd53f7a89b299d59eaf@syzkaller.appspotmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-30octeontx2-pf: Add XDP support to netdev PFGeetha sowjanya6-34/+322
Adds XDP_PASS, XDP_TX, XDP_DROP and XDP_REDIRECT support for netdev PF. Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-30octeontx2-af: Adjust LA pointer for cpt parse headerKiran Kumar K2-95/+80
In case of ltype NPC_LT_LA_CPT_HDR, LA pointer is pointing to the start of cpt parse header. Since cpt parse header has veriable length padding, this will be a problem for DMAC extraction. Adding KPU profile changes to adjust the LA pointer to start at ether header in case of cpt parse header by - Adding ptr advance in pkind 58 to a fixed value 40 - Adding variable length offset 7 and mask 7 (pad len in CPT_PARSE_HDR). Also added the missing static declaration for npc_set_var_len_offset_pkind function. Signed-off-by: Kiran Kumar K <kirankumark@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-30libbpf: Fix skel_internal.h to set errno on loader retval < 0Kumar Kartikeya Dwivedi1-2/+4
When the loader indicates an internal error (result of a checked bpf system call), it returns the result in attr.test.retval. However, tests that rely on ASSERT_OK_PTR on NULL (returned from light skeleton) may miss that NULL denotes an error if errno is set to 0. This would result in skel pointer being NULL, while ASSERT_OK_PTR returning 1, leading to a SEGV on dereference of skel, because libbpf_get_error relies on the assumption that errno is always set in case of error for ptr == NULL. In particular, this was observed for the ksyms_module test. When executed using `./test_progs -t ksyms`, prior tests manipulated errno and the test didn't crash when it failed at ksyms_module load, while using `./test_progs -t ksyms_module` crashed due to errno being untouched. Fixes: 67234743736a (libbpf: Generate loader program out of BPF ELF file.) Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210927145941.1383001-11-memxor@gmail.com
2021-09-30net_sched: Use struct_size() and flex_array_size() helpersGustavo A. R. Silva1-3/+4
Make use of the struct_size() and flex_array_size() helpers instead of an open-coded version, in order to avoid any potential type mistakes or integer overflows that, in the worse scenario, could lead to heap overflows. Link: https://github.com/KSPP/linux/issues/160 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20210928193107.GA262595@embeddedor Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-30libbpf: Properly ignore STT_SECTION symbols in legacy map definitionsToke Høiland-Jørgensen1-0/+2
The previous patch to ignore STT_SECTION symbols only added the ignore condition in one of them. This fails if there's more than one map definition in the 'maps' section, because the subsequent modulus check will fail, resulting in error messages like: libbpf: elf: unable to determine legacy map definition size in ./xdpdump_xdp.o Fix this by also ignoring STT_SECTION in the first loop. Fixes: c3e8c44a9063 ("libbpf: Ignore STT_SECTION symbols in 'maps' section") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210929213837.832449-1-toke@redhat.com
2021-09-30bpf: Do not invoke the XDP dispatcher for PROG_RUN with single repeatLorenz Bauer1-2/+4
We have a unit test that invokes an XDP program with 1m different inputs, aka 1m BPF_PROG_RUN syscalls. We run this test concurrently with slight variations in how we generated the input. Since commit f23c4b3924d2 ("bpf: Start using the BPF dispatcher in BPF_TEST_RUN") the unit test has slowed down significantly. Digging deeper reveals that the concurrent tests are serialised in the kernel on the XDP dispatcher. This is a global resource that is protected by a mutex, on which we contend. Fix this by not calling into the XDP dispatcher if we only want to perform a single run of the BPF program. See: https://lore.kernel.org/bpf/CACAyw9_y4QumOW35qpgTbLsJ532uGq-kVW-VESJzGyiZkypnvw@mail.gmail.com/ Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210928093100.27124-1-lmb@cloudflare.com
2021-09-29Bluetooth: hci_vhci: Add force_prevent_wake entryLuiz Augusto von Dentz1-0/+49
This adds force_prevent_wake which can be used to force a certain state while interacting with force_suspend. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-09-29Bluetooth: hci_vhci: Add force_suspend entryLuiz Augusto von Dentz1-0/+53
This adds force_suspend which can be used to force the controller into suspend/resume state. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-09-29libbpf: Make gen_loader data aligned.Alexei Starovoitov1-1/+6
Align gen_loader data to 8 byte boundary to make sure union bpf_attr, bpf_insns and other structs are aligned. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210927145941.1383001-9-memxor@gmail.com
2021-09-29bpf: selftests: Fix fd cleanup in get_branch_snapshotKumar Kartikeya Dwivedi1-3/+2
Cleanup code uses while (cpu++ < cpu_cnt) for closing fds, which means it starts iterating from 1 for closing fds. If the first fd is -1, it skips over it and closes garbage fds (typically zero) in the remaining array. This leads to test failures for future tests when they end up storing fd 0 (as the slot becomes free due to close(0)) in ldimm64's BTF fd, ending up trying to match module BTF id with vmlinux. This was observed as spurious CI failure for the ksym_module_libbpf and module_attach tests. The test ends up closing fd 0 and breaking libbpf's assumption that module BTF fd will always be > 0, which leads to the kernel thinking that we are pointing to a BTF ID in vmlinux BTF. Fixes: 025bd7c753aa (selftests/bpf: Add test for bpf_get_branch_snapshot) Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20210927145941.1383001-12-memxor@gmail.com
2021-09-29MAINTAINERS: Update Mun Yew Tham as Altera Pio Driver maintainerMun Yew Tham1-1/+1
Update Altera Pio Driver maintainer's email from <joyce.ooi@intel.com> to <mun.yew.tham@intel.com> Signed-off-by: Mun Yew Tham <mun.yew.tham@intel.com> Acked-by: Joyce Ooi <joyce.ooi@intel.com> Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
2021-09-29MAINTAINERS: update my email addressBartosz Golaszewski1-4/+4
My professional situation changes soon. Update my email address. Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
2021-09-29gpio: pca953x: do not ignore i2c errorsAndrey Gusakov1-9/+2
Per gpio_chip interface, error shall be proparated to the caller. Attempt to silent diagnostics by returning zero (as written in the comment) is plain wrong, because the zero return can be interpreted by the caller as the gpio value. Cc: stable@vger.kernel.org Signed-off-by: Andrey Gusakov <andrey.gusakov@cogentembedded.com> Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com> Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
2021-09-29devlink: Add missed notifications iteratorsLeon Romanovsky1-2/+21
The commit mentioned in Fixes line missed a couple of notifications that were registered before devlink_register() and should be delayed too. As such, the too early placed WARN_ON() check spotted it. WARNING: CPU: 1 PID: 6540 at net/core/devlink.c:5158 devlink_nl_region_notify+0x184/0x1e0 net/core/devlink.c:5158 Modules linked in: CPU: 1 PID: 6540 Comm: syz-executor.0 Not tainted 5.15.0-rc2-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:devlink_nl_region_notify+0x184/0x1e0 net/core/devlink.c:5158 Code: 38 41 b8 c0 0c 00 00 31 d2 48 89 ee 4c 89 e7 e8 72 1a 26 00 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e e9 01 bd 41 fa e8 fc bc 41 fa <0f> 0b e9 f7 fe ff ff e8 f0 bc 41 fa 0f 0b eb da 4c 89 e7 e8 c4 18 RSP: 0018:ffffc90002d6f658 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff88801f08d580 RSI: ffffffff87344e94 RDI: 0000000000000003 RBP: ffff88801ee42100 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff87344d8a R11: 0000000000000000 R12: ffff88801c1dc000 R13: 0000000000000000 R14: 000000000000002c R15: ffff88801c1dc070 FS: 0000555555e8e400(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055dd7c590310 CR3: 0000000069a09000 CR4: 00000000003506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: devlink_region_create+0x39f/0x4c0 net/core/devlink.c:10327 nsim_dev_dummy_region_init drivers/net/netdevsim/dev.c:481 [inline] nsim_dev_probe+0x5f6/0x1150 drivers/net/netdevsim/dev.c:1479 call_driver_probe drivers/base/dd.c:517 [inline] really_probe+0x245/0xcc0 drivers/base/dd.c:596 __driver_probe_device+0x338/0x4d0 drivers/base/dd.c:751 driver_probe_device+0x4c/0x1a0 drivers/base/dd.c:781 __device_attach_driver+0x20b/0x2f0 drivers/base/dd.c:898 bus_for_each_drv+0x15f/0x1e0 drivers/base/bus.c:427 __device_attach+0x228/0x4a0 drivers/base/dd.c:969 bus_probe_device+0x1e4/0x290 drivers/base/bus.c:487 device_add+0xc35/0x21b0 drivers/base/core.c:3359 nsim_bus_dev_new drivers/net/netdevsim/bus.c:435 [inline] new_device_store+0x48b/0x770 drivers/net/netdevsim/bus.c:302 bus_attr_store+0x72/0xa0 drivers/base/bus.c:122 sysfs_kf_write+0x110/0x160 fs/sysfs/file.c:139 kernfs_fop_write_iter+0x342/0x500 fs/kernfs/file.c:296 call_write_iter include/linux/fs.h:2163 [inline] new_sync_write+0x429/0x660 fs/read_write.c:507 vfs_write+0x7cf/0xae0 fs/read_write.c:594 ksys_write+0x12d/0x250 fs/read_write.c:647 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f328409d3ef Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 99 fd ff ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 cc fd ff ff 48 RSP: 002b:00007ffdc6851140 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f328409d3ef RDX: 0000000000000003 RSI: 00007ffdc6851190 RDI: 0000000000000004 RBP: 0000000000000004 R08: 0000000000000000 R09: 00007ffdc68510e0 R10: 0000000000000000 R11: 0000000000000293 R12: 00007f3284144971 R13: 00007ffdc6851190 R14: 0000000000000000 R15: 00007ffdc6851860 Fixes: cf530217408e ("devlink: Notify users when objects are accessible") Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/2ed1159291f2a589b013914f2b60d8172fc525c1.1632925030.git.leonro@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-29Merge tag 'sound-5.15-rc4' of ↵Linus Torvalds30-102/+272
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "This became a slightly large collection of changes, partly because I've been off in the last weeks. Most of changes are small and scattered while a bit big change is found in HD-audio Realtek codec driver; it's a very device-specific fix that has been long wanted, so I decided to pick up although it's in the middle RC. Some highlights: - A new guard ioctl for ALSA rawmidi API to avoid the misuse of the new timestamp framing mode; it's for a regression fix - HD-audio: a revert of the 5.15 change that might work badly, new quirks for Lenovo Legion & co, a follow-up fix for CS8409 - ASoC: lots of SOF-related fixes, fsl component fixes, corrections of mediatek drivers - USB-audio: fix for the PM resume - FireWire: oxfw and motu fixes" * tag 'sound-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (25 commits) ALSA: pcsp: Make hrtimer forwarding more robust ALSA: rawmidi: introduce SNDRV_RAWMIDI_IOCTL_USER_PVERSION ALSA: firewire-motu: fix truncated bytes in message tracepoints ASoC: SOF: trace: Omit error print when waking up trace sleepers ASoC: mediatek: mt8195: remove wrong fixup assignment on HDMITX ASoC: SOF: loader: Re-phrase the missing firmware error to avoid duplication ASoC: SOF: loader: release_firmware() on load failure to avoid batching ALSA: hda/cs8409: Setup Dolphin Headset Mic as Phantom Jack ALSA: pcxhr: "fix" PCXHR_REG_TO_PORT definition ASoC: SOF: imx: imx8m: Bar index is only valid for IRAM and SRAM types ASoC: SOF: imx: imx8: Bar index is only valid for IRAM and SRAM types ASoC: SOF: Fix DSP oops stack dump output contents ALSA: hda/realtek: Quirks to enable speaker output for Lenovo Legion 7i 15IMHG05, Yoga 7i 14ITL5/15ITL5, and 13s Gen2 laptops. ALSA: usb-audio: Unify mixer resume and reset_resume procedure Revert "ALSA: hda: Drop workaround for a hang at shutdown again" ALSA: oxfw: fix transmission method for Loud models based on OXFW971 ASoC: mediatek: common: handle NULL case in suspend/resume function ASoC: fsl_xcvr: register platform component before registering cpu dai ASoC: fsl_spdif: register platform component before registering cpu dai ASoC: fsl_micfil: register platform component before registering cpu dai ...
2021-09-29Merge branch 'linus' of ↵Linus Torvalds2-8/+11
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "This contains fixes for a resource leak in ccp as well as stack corruption in x86/sm4" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: x86/sm4 - Fix frame pointer stack corruption crypto: ccp - fix resource leaks in ccp_run_aes_gcm_cmd()
2021-09-29Bluetooth: Make use of hci_{suspend,resume}_dev on suspend notifierLuiz Augusto von Dentz1-49/+67
This moves code from hci_suspend_notifier to hci_{suspend,resume}_dev since some driver may handle pm directly using HCI_QUIRK_NO_SUSPEND_NOTIFIER they would instead call hci_{suspend,resume}_dev directly and we want that to have the same behavior regardless of where pm is being handled. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-09-29xsk: Fix clang build error in __xp_allocMagnus Karlsson1-1/+0
Fix a build error with clang in __xp_alloc(): [...] net/xdp/xsk_buff_pool.c:465:15: error: variable 'xskb' is uninitialized when used here [-Werror,-Wuninitialized] xp_release(xskb); ^~~~ This is correctly detected by clang, but not gcc. In fact, the xp_release() statement should not be there at all in the refactored code, just remove it. Fixes: 94033cd8e73b ("xsk: Optimize for aligned case") Reported-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/bpf/20210929061403.8587-1-magnus.karlsson@gmail.com
2021-09-29gve: Use kvcalloc() instead of kvzalloc()Gustavo A. R. Silva1-8/+7
Use 2-factor argument form kvcalloc() instead of kvzalloc(). Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-29net/ipv4/datagram.c: remove superfluous header files from datagram.cMianhan Liu1-1/+0
datagram.c hasn't use any macro or function declared in linux/ip.h. Thus, these files can be removed from datagram.c safely without affecting the compilation of the net/ipv4 module Signed-off-by: Mianhan Liu <liumh1@shanghaitech.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-29net/dsa/tag_ksz.c: remove superfluous headersMianhan Liu1-1/+0
tag_ksz.c hasn't use any macro or function declared in linux/slab.h. Thus, these files can be removed from tag_ksz.c safely without affecting the compilation of the ./net/dsa module Signed-off-by: Mianhan Liu <liumh1@shanghaitech.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-29net/dsa/tag_8021q.c: remove superfluous headersMianhan Liu1-1/+0
tag_8021q.c hasn't use any macro or function declared in linux/if_bridge.h. Thus, these files can be removed from tag_8021q.c safely without affecting the compilation of the ./net/dsa module Signed-off-by: Mianhan Liu <liumh1@shanghaitech.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-29net: phy: bcm7xxx: Fixed indirect MMD operationsFlorian Fainelli1-4/+110
When EEE support was added to the 28nm EPHY it was assumed that it would be able to support the standard clause 45 over clause 22 register access method. It turns out that the PHY does not support that, which is the very reason for using the indirect shadow mode 2 bank 3 access method. Implement {read,write}_mmd to allow the standard PHY library routines pertaining to EEE querying and configuration to work correctly on these PHYs. This forces us to implement a __phy_set_clr_bits() function that does not grab the MDIO bus lock since the PHY driver's {read,write}_mmd functions are always called with that lock held. Fixes: 83ee102a6998 ("net: phy: bcm7xxx: add support for 28nm EPHY") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-29net/mlx4: Use array_size() helper in copy_to_user()Gustavo A. R. Silva1-1/+2
Use array_size() helper instead of the open-coded version in copy_to_user(). These sorts of multiplication factors need to be wrapped in array_size(). Link: https://github.com/KSPP/linux/issues/160 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-29net: bridge: Use array_size() helper in copy_to_user()Gustavo A. R. Silva1-3/+5
Use array_size() helper instead of the open-coded version in copy_to_user(). These sorts of multiplication factors need to be wrapped in array_size(). Link: https://github.com/KSPP/linux/issues/160 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-29ethtool: ioctl: Use array_size() helper in copy_{from,to}_user()Gustavo A. R. Silva1-5/+7
Use array_size() helper instead of the open-coded version in copy_{from,to}_user(). These sorts of multiplication factors need to be wrapped in array_size(). Link: https://github.com/KSPP/linux/issues/160 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-29Merge branch 'hns3-fixes'David S. Miller7-73/+60
Guangbin Huang says: ==================== net: hns3: add some fixes for -net This series adds some fixes for the HNS3 ethernet driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-29net: hns3: disable firmware compatible features when uninstall PFGuangbin Huang1-8/+13
Currently, the firmware compatible features are enabled in PF driver initialization process, but they are not disabled in PF driver deinitialization process and firmware keeps these features in enabled status. In this case, if load an old PF driver (for example, in VM) which not support the firmware compatible features, firmware will still send mailbox message to PF when link status changed and PF will print "un-supported mailbox message, code = 201". To fix this problem, disable these firmware compatible features in PF driver deinitialization process. Fixes: ed8fb4b262ae ("net: hns3: add link change event report") Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-29net: hns3: fix always enable rx vlan filter problem after selftestGuangbin Huang1-2/+4
Currently, the rx vlan filter will always be disabled before selftest and be enabled after selftest as the rx vlan filter feature is fixed on in old device earlier than V3. However, this feature is not fixed in some new devices and it can be disabled by user. In this case, it is wrong if rx vlan filter is enabled after selftest. So fix it. Fixes: bcc26e8dc432 ("net: hns3: remove unused code in hns3_self_test()") Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-29net: hns3: PF enable promisc for VF when mac table is overflowGuangbin Huang1-2/+6
If unicast mac address table is full, and user add a new mac address, the unicast promisc needs to be enabled for the new unicast mac address can be used. So does the multicast promisc. Now this feature has been implemented for PF, and VF should be implemented too. When the mac table of VF is overflow, PF will enable promisc for this VF. Fixes: 1e6e76101fd9 ("net: hns3: configure promisc mode for VF asynchronously") Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-29net: hns3: fix show wrong state when add existing uc mac addressJian Shen1-10/+9
Currently, if function adds an existing unicast mac address, eventhough driver will not add this address into hardware, but it will return 0 in function hclge_add_uc_addr_common(). It will cause the state of this unicast mac address is ACTIVE in driver, but it should be in TO-ADD state. To fix this problem, function hclge_add_uc_addr_common() returns -EEXIST if mac address is existing, and delete two error log to avoid printing them all the time after this modification. Fixes: 72110b567479 ("net: hns3: return 0 and print warning when hit duplicate MAC") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-29net: hns3: fix mixed flag HCLGE_FLAG_MQPRIO_ENABLE and HCLGE_FLAG_DCB_ENABLEJian Shen2-28/+10
HCLGE_FLAG_MQPRIO_ENABLE is supposed to set when enable multiple TCs with tc mqprio, and HCLGE_FLAG_DCB_ENABLE is supposed to set when enable multiple TCs with ets. But the driver mixed the flags when updating the tm configuration. Furtherly, PFC should be available when HCLGE_FLAG_MQPRIO_ENABLE too, so remove the unnecessary limitation. Fixes: 5a5c90917467 ("net: hns3: add support for tc mqprio offload") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-29net: hns3: don't rollback when destroy mqprio failJian Shen1-6/+11
For destroy mqprio is irreversible in stack, so it's unnecessary to rollback the tc configuration when destroy mqprio failed. Otherwise, it may cause the configuration being inconsistent between driver and netstack. As the failure is usually caused by reset, and the driver will restore the configuration after reset, so it can keep the configuration being consistent between driver and hardware. Fixes: 5a5c90917467 ("net: hns3: add support for tc mqprio offload") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-29net: hns3: remove tc enable checkingJian Shen4-17/+2
Currently, in function hns3_nic_set_real_num_queue(), the driver doesn't report the queue count and offset for disabled tc. If user enables multiple TCs, but only maps user priorities to partial of them, it may cause the queue range of the unmapped TC being displayed abnormally. Fix it by removing the tc enable checking, ensure the queue count is not zero. With this change, the tc_en is useless now, so remove it. Fixes: a75a8efa00c5 ("net: hns3: Fix tc setup when netdev is first up") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-29net: hns3: do not allow call hns3_nic_net_open repeatedlyJian Shen1-0/+5
hns3_nic_net_open() is not allowed to called repeatly, but there is no checking for this. When doing device reset and setup tc concurrently, there is a small oppotunity to call hns3_nic_net_open repeatedly, and cause kernel bug by calling napi_enable twice. The calltrace information is like below: [ 3078.222780] ------------[ cut here ]------------ [ 3078.230255] kernel BUG at net/core/dev.c:6991! [ 3078.236224] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP [ 3078.243431] Modules linked in: hns3 hclgevf hclge hnae3 vfio_iommu_type1 vfio_pci vfio_virqfd vfio pv680_mii(O) [ 3078.258880] CPU: 0 PID: 295 Comm: kworker/u8:5 Tainted: G O 5.14.0-rc4+ #1 [ 3078.269102] Hardware name: , BIOS KpxxxFPGA 1P B600 V181 08/12/2021 [ 3078.276801] Workqueue: hclge hclge_service_task [hclge] [ 3078.288774] pstate: 60400009 (nZCv daif +PAN -UAO -TCO BTYPE=--) [ 3078.296168] pc : napi_enable+0x80/0x84 tc qdisc sho[w 3d0e7v8 .e3t0h218 79] lr : hns3_nic_net_open+0x138/0x510 [hns3] [ 3078.314771] sp : ffff8000108abb20 [ 3078.319099] x29: ffff8000108abb20 x28: 0000000000000000 x27: ffff0820a8490300 [ 3078.329121] x26: 0000000000000001 x25: ffff08209cfc6200 x24: 0000000000000000 [ 3078.339044] x23: ffff0820a8490300 x22: ffff08209cd76000 x21: ffff0820abfe3880 [ 3078.349018] x20: 0000000000000000 x19: ffff08209cd76900 x18: 0000000000000000 [ 3078.358620] x17: 0000000000000000 x16: ffffc816e1727a50 x15: 0000ffff8f4ff930 [ 3078.368895] x14: 0000000000000000 x13: 0000000000000000 x12: 0000259e9dbeb6b4 [ 3078.377987] x11: 0096a8f7e764eb40 x10: 634615ad28d3eab5 x9 : ffffc816ad8885b8 [ 3078.387091] x8 : ffff08209cfc6fb8 x7 : ffff0820ac0da058 x6 : ffff0820a8490344 [ 3078.396356] x5 : 0000000000000140 x4 : 0000000000000003 x3 : ffff08209cd76938 [ 3078.405365] x2 : 0000000000000000 x1 : 0000000000000010 x0 : ffff0820abfe38a0 [ 3078.414657] Call trace: [ 3078.418517] napi_enable+0x80/0x84 [ 3078.424626] hns3_reset_notify_up_enet+0x78/0xd0 [hns3] [ 3078.433469] hns3_reset_notify+0x64/0x80 [hns3] [ 3078.441430] hclge_notify_client+0x68/0xb0 [hclge] [ 3078.450511] hclge_reset_rebuild+0x524/0x884 [hclge] [ 3078.458879] hclge_reset_service_task+0x3c4/0x680 [hclge] [ 3078.467470] hclge_service_task+0xb0/0xb54 [hclge] [ 3078.475675] process_one_work+0x1dc/0x48c [ 3078.481888] worker_thread+0x15c/0x464 [ 3078.487104] kthread+0x160/0x170 [ 3078.492479] ret_from_fork+0x10/0x18 [ 3078.498785] Code: c8027c81 35ffffa2 d50323bf d65f03c0 (d4210000) [ 3078.506889] ---[ end trace 8ebe0340a1b0fb44 ]--- Once hns3_nic_net_open() is excute success, the flag HNS3_NIC_STATE_DOWN will be cleared. So add checking for this flag, directly return when HNS3_NIC_STATE_DOWN is no set. Fixes: e888402789b9 ("net: hns3: call hns3_nic_net_open() while doing HNAE3_UP_CLIENT") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-29Merge branch 'mctp-core-updates'David S. Miller8-77/+431
Matt Johnston says: ==================== Updates to MCTP core This series adds timeouts for MCTP tags (a limited resource), and a few other improvements to the MCTP core. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-29mctp: Warn if pointer is set for a wrong dev typeMatt Johnston1-7/+24
Should not occur but is a sanity check. May help tracking down Trinity reported issue https://lore.kernel.org/lkml/20210913030701.GA5926@xsang-OptiPlex-9020/ Signed-off-by: Matt Johnston <matt@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-29mctp: Set route MTU via netlinkMatt Johnston1-1/+13
A route's RTAX_MTU can be set in nested RTAX_METRICS Signed-off-by: Matt Johnston <matt@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-29doc/mctp: Add a little detail about kernel internalsJeremy Kerr1-0/+59
Describe common flows and refcounting behaviour. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-29mctp: Do inits as a subsys_initcallJeremy Kerr1-1/+1
In a future change, we'll want to provide a registration call for mctp-specific devices. This requires us to have the networks established before device driver inits, so run the core init as a subsys_initcall. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-29mctp: Add tracepoints for tag/key handlingJeremy Kerr3-1/+92
The tag allocation, release and bind events are somewhat opaque outside the kernel; this change adds a few tracepoints to assist in instrumentation and debugging. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>