summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-02-07net: dsa: mv88e6xxx: Fix off by in one in mv88e6185_phylink_get_caps()Dan Carpenter1-1/+1
The <= ARRAY_SIZE() needs to be < ARRAY_SIZE() to prevent an out of bounds error. Fixes: d4ebf12bcec4 ("net: dsa: mv88e6xxx: populate supported_interfaces and mac_capabilities") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-07net: hns3: add support for TX push modeYufeng Mo8-8/+118
For the device that supports the TX push capability, the BD can be directly copied to the device memory. However, due to hardware restrictions, the push mode can be used only when there are no more than two BDs, otherwise, the doorbell mode based on device memory is used. Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-07net: asix: add proper error handling of usb read errorsPavel Skripkin3-11/+33
Syzbot once again hit uninit value in asix driver. The problem still the same -- asix_read_cmd() reads less bytes, than was requested by caller. Since all read requests are performed via asix_read_cmd() let's catch usb related error there and add __must_check notation to be sure all callers actually check return value. So, this patch adds sanity check inside asix_read_cmd(), that simply checks if bytes read are not less, than was requested and adds missing error handling of asix_read_cmd() all across the driver code. Fixes: d9fe64e51114 ("net: asix: Add in_pm parameter") Reported-and-tested-by: syzbot+6ca9f7867b77c2d316ac@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-07r8169: factor out redundant RTL8168d PHY config functionality to ↵Heiner Kallweit1-46/+25
rtl8168d_1_common() rtl8168d_2_hw_phy_config() shares quite some functionality with rtl8168d_1_hw_phy_config(), so let's factor out the common part to a new function rtl8168d_1_common(). In addition improve the code a little. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-07ip6mr: fix use-after-free in ip6mr_sk_done()Eric Dumazet2-3/+7
Apparently addrconf_exit_net() is called before igmp6_net_exit() and ndisc_net_exit() at netns dismantle time: net_namespace: call ip6table_mangle_net_exit() net_namespace: call ip6_tables_net_exit() net_namespace: call ipv6_sysctl_net_exit() net_namespace: call ioam6_net_exit() net_namespace: call seg6_net_exit() net_namespace: call ping_v6_proc_exit_net() net_namespace: call tcpv6_net_exit() ip6mr_sk_done sk=ffffa354c78a74c0 net_namespace: call ipv6_frags_exit_net() net_namespace: call addrconf_exit_net() net_namespace: call ip6addrlbl_net_exit() net_namespace: call ip6_flowlabel_net_exit() net_namespace: call ip6_route_net_exit_late() net_namespace: call fib6_rules_net_exit() net_namespace: call xfrm6_net_exit() net_namespace: call fib6_net_exit() net_namespace: call ip6_route_net_exit() net_namespace: call ipv6_inetpeer_exit() net_namespace: call if6_proc_net_exit() net_namespace: call ipv6_proc_exit_net() net_namespace: call udplite6_proc_exit_net() net_namespace: call raw6_exit_net() net_namespace: call igmp6_net_exit() ip6mr_sk_done sk=ffffa35472b2a180 ip6mr_sk_done sk=ffffa354c78a7980 net_namespace: call ndisc_net_exit() ip6mr_sk_done sk=ffffa35472b2ab00 net_namespace: call ip6mr_net_exit() net_namespace: call inet6_net_exit() This was fine because ip6mr_sk_done() would not reach the point decreasing net->ipv6.devconf_all->mc_forwarding until my patch in ip6mr_sk_done(). To fix this without changing struct pernet_operations ordering, we can clear net->ipv6.devconf_dflt and net->ipv6.devconf_all when they are freed from addrconf_exit_net() BUG: KASAN: use-after-free in instrument_atomic_read include/linux/instrumented.h:71 [inline] BUG: KASAN: use-after-free in atomic_read include/linux/atomic/atomic-instrumented.h:27 [inline] BUG: KASAN: use-after-free in ip6mr_sk_done+0x11b/0x410 net/ipv6/ip6mr.c:1578 Read of size 4 at addr ffff88801ff08688 by task kworker/u4:4/963 CPU: 0 PID: 963 Comm: kworker/u4:4 Not tainted 5.17.0-rc2-syzkaller-00650-g5a8fb33e5305 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: netns cleanup_net Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_address_description.constprop.0.cold+0x8d/0x336 mm/kasan/report.c:255 __kasan_report mm/kasan/report.c:442 [inline] kasan_report.cold+0x83/0xdf mm/kasan/report.c:459 check_region_inline mm/kasan/generic.c:183 [inline] kasan_check_range+0x13d/0x180 mm/kasan/generic.c:189 instrument_atomic_read include/linux/instrumented.h:71 [inline] atomic_read include/linux/atomic/atomic-instrumented.h:27 [inline] ip6mr_sk_done+0x11b/0x410 net/ipv6/ip6mr.c:1578 rawv6_close+0x58/0x80 net/ipv6/raw.c:1201 inet_release+0x12e/0x280 net/ipv4/af_inet.c:428 inet6_release+0x4c/0x70 net/ipv6/af_inet6.c:478 __sock_release net/socket.c:650 [inline] sock_release+0x87/0x1b0 net/socket.c:678 inet_ctl_sock_destroy include/net/inet_common.h:65 [inline] igmp6_net_exit+0x6b/0x170 net/ipv6/mcast.c:3173 ops_exit_list+0xb0/0x170 net/core/net_namespace.c:168 cleanup_net+0x4ea/0xb00 net/core/net_namespace.c:600 process_one_work+0x9ac/0x1650 kernel/workqueue.c:2307 worker_thread+0x657/0x1110 kernel/workqueue.c:2454 kthread+0x2e9/0x3a0 kernel/kthread.c:377 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 </TASK> Fixes: f2f2325ec799 ("ip6mr: ip6mr_sk_done() can exit early in common cases") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-07caif: cleanup double word in commentTom Rix1-1/+1
Replace the second 'so' with 'free'. Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-07Merge branch 'mlxsw-dip-sip-mangling'David S. Miller6-9/+394
Ido Schimmel says: ==================== mlxsw: Add SIP and DIP mangling support Danielle says: On Spectrum-2 onwards, it is possible to overwrite SIP and DIP address of an IPv4 or IPv6 packet in the ACL engine. That corresponds to pedit munges of, respectively, ip src and ip dst fields, and likewise for ip6. Offload these munges on the systems where they are supported. Patchset overview: Patch #1: introduces SIP_DIP_ACTION and its fields. Patch #2-#3: adds the new pedit fields, and dispatches on them on Spectrum-2 and above. Patch #4 adds a selftest. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-07selftests: forwarding: Add a test for pedit munge SIP and DIPDanielle Ratson1-0/+201
Add a test that checks that pedit adjusts source and destination addresses of IPv4 and IPv6 packets. Output example: $ ./pedit_ip.sh TEST: ping [ OK ] TEST: ping6 [ OK ] TEST: dev swp2 ingress pedit ip src set 198.51.100.1 [ OK ] TEST: dev swp3 egress pedit ip src set 198.51.100.1 [ OK ] TEST: dev swp2 ingress pedit ip dst set 198.51.100.1 [ OK ] TEST: dev swp3 egress pedit ip dst set 198.51.100.1 [ OK ] TEST: dev swp2 ingress pedit ip6 src set 2001:db8:2::1 [ OK ] TEST: dev swp3 egress pedit ip6 src set 2001:db8:2::1 [ OK ] TEST: dev swp2 ingress pedit ip6 dst set 2001:db8:2::1 [ OK ] TEST: dev swp3 egress pedit ip6 dst set 2001:db8:2::1 [ OK ] Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-07mlxsw: Support FLOW_ACTION_MANGLE for SIP and DIP IPv6 addressesDanielle Ratson3-11/+103
Spectrum-2 supports an ACL action SIP_DIP, which allows IPv4 and IPv6 source and destination addresses change. Offload suitable mangles to the IPv6 address change action. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-07mlxsw: Support FLOW_ACTION_MANGLE for SIP and DIP IPv4 addressesDanielle Ratson3-0/+47
Spectrum-2 supports an ACL action SIP_DIP, which allows IPv4 and IPv6 source and destination addresses change. Offload suitable mangles to the IPv4 address change action. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-07mlxsw: core_acl_flex_actions: Add SIP_DIP_ACTIONDanielle Ratson1-0/+45
Add fields related to SIP_DIP_ACTION, which is used for changing of SIP and DIP addresses. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-07Merge branch 'ipv6-kfree_skb_reason'David S. Miller5-21/+84
Menglong Dong says: ==================== net: use kfree_skb_reason() for ip/udp packet receive In this series patches, kfree_skb() is replaced with kfree_skb_reason() during ipv4 and udp4 packet receiving path, and following drop reasons are introduced: SKB_DROP_REASON_SOCKET_FILTER SKB_DROP_REASON_NETFILTER_DROP SKB_DROP_REASON_OTHERHOST SKB_DROP_REASON_IP_CSUM SKB_DROP_REASON_IP_INHDR SKB_DROP_REASON_IP_RPFILTER SKB_DROP_REASON_UNICAST_IN_L2_MULTICAST SKB_DROP_REASON_XFRM_POLICY SKB_DROP_REASON_IP_NOPROTO SKB_DROP_REASON_SOCKET_RCVBUFF SKB_DROP_REASON_PROTO_MEM TCP is more complex, so I left it in the next series. I just figure out how __print_symbolic() works. It doesn't base on the array index, but searching for symbols by loop. So I'm a little afraid it's performance. Changes since v3: - fix some small problems in the third patch (net: ipv4: use kfree_skb_reason() in ip_rcv_core()), as David Ahern said Changes since v2: - use SKB_DROP_REASON_PKT_TOO_SMALL for a path in ip_rcv_core() Changes since v1: - add document for all drop reasons, as David advised - remove unreleated cleanup - remove EARLY_DEMUX and IP_ROUTE_INPUT drop reason - replace {UDP, TCP}_FILTER with SOCKET_FILTER ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-07net: udp: use kfree_skb_reason() in __udp_queue_rcv_skb()Menglong Dong3-3/+14
Replace kfree_skb() with kfree_skb_reason() in __udp_queue_rcv_skb(). Following new drop reasons are introduced: SKB_DROP_REASON_SOCKET_RCVBUFF SKB_DROP_REASON_PROTO_MEM Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-07net: udp: use kfree_skb_reason() in udp_queue_rcv_one_skb()Menglong Dong1-3/+9
Replace kfree_skb() with kfree_skb_reason() in udp_queue_rcv_one_skb(). Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-07net: ipv4: use kfree_skb_reason() in ip_protocol_deliver_rcu()Menglong Dong3-2/+7
Replace kfree_skb() with kfree_skb_reason() in ip_protocol_deliver_rcu(). Following new drop reasons are introduced: SKB_DROP_REASON_XFRM_POLICY SKB_DROP_REASON_IP_NOPROTO Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-07net: ipv4: use kfree_skb_reason() in ip_rcv_finish_core()Menglong Dong3-4/+22
Replace kfree_skb() with kfree_skb_reason() in ip_rcv_finish_core(), following drop reasons are introduced: SKB_DROP_REASON_IP_RPFILTER SKB_DROP_REASON_UNICAST_IN_L2_MULTICAST Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-07net: ipv4: use kfree_skb_reason() in ip_rcv_core()Menglong Dong3-2/+22
Replace kfree_skb() with kfree_skb_reason() in ip_rcv_core(). Three new drop reasons are introduced: SKB_DROP_REASON_OTHERHOST SKB_DROP_REASON_IP_CSUM SKB_DROP_REASON_IP_INHDR Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-07net: netfilter: use kfree_drop_reason() for NF_DROPMenglong Dong3-1/+4
Replace kfree_skb() with kfree_skb_reason() in nf_hook_slow() when skb is dropped by reason of NF_DROP. Following new drop reasons are introduced: SKB_DROP_REASON_NETFILTER_DROP Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-07net: skb_drop_reason: add document for drop reasonsMenglong Dong1-6/+6
Add document for following existing drop reasons: SKB_DROP_REASON_NOT_SPECIFIED SKB_DROP_REASON_NO_SOCKET SKB_DROP_REASON_PKT_TOO_SMALL SKB_DROP_REASON_TCP_CSUM SKB_DROP_REASON_SOCKET_FILTER SKB_DROP_REASON_UDP_CSUM Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-06ref_tracker: remove filter_irq_stacks() callEric Dumazet1-2/+0
After commit e94006608949 ("lib/stackdepot: always do filter_irq_stacks() in stack_depot_save()") it became unnecessary to filter the stack before calling stack_depot_save(). Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-06net: initialize init_net earlierEric Dumazet4-14/+14
While testing a patch that will follow later ("net: add netns refcount tracker to struct nsproxy") I found that devtmpfs_init() was called before init_net was initialized. This is a bug, because devtmpfs_setup() calls ksys_unshare(CLONE_NEWNS); This has the effect of increasing init_net refcount, which will be later overwritten to 1, as part of setup_net(&init_net) We had too many prior patches [1] trying to work around the root cause. Really, make sure init_net is in BSS section, and that net_ns_init() is called earlier at boot time. Note that another patch ("vfs: add netns refcount tracker to struct fs_context") also will need net_ns_init() being called before vfs_caches_init() As a bonus, this patch saves around 4KB in .data section. [1] f8c46cb39079 ("netns: do not call pernet ops for not yet set up init_net namespace") b5082df8019a ("net: Initialise init_net.count to 1") 734b65417b24 ("net: Statically initialize init_net.dev_base_head") v2: fixed a build error reported by kernel build bots (CONFIG_NET=n) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-06net: hsr: use hlist_head instead of list_head for mac addressesJuhee Kang7-107/+171
Currently, HSR manages mac addresses of known HSR nodes by using list_head. It takes a lot of time when there are a lot of registered nodes due to finding specific mac address nodes by using linear search. We can be reducing the time by using hlist. Thus, this patch moves list_head to hlist_head for mac addresses and this allows for further improvement of network performance. Condition: registered 10,000 known HSR nodes Before: # iperf3 -c 192.168.10.1 -i 1 -t 10 Connecting to host 192.168.10.1, port 5201 [ 5] local 192.168.10.2 port 59442 connected to 192.168.10.1 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.49 sec 3.75 MBytes 21.1 Mbits/sec 0 158 KBytes [ 5] 1.49-2.05 sec 1.25 MBytes 18.7 Mbits/sec 0 166 KBytes [ 5] 2.05-3.06 sec 2.44 MBytes 20.3 Mbits/sec 56 16.9 KBytes [ 5] 3.06-4.08 sec 1.43 MBytes 11.7 Mbits/sec 11 38.0 KBytes [ 5] 4.08-5.00 sec 951 KBytes 8.49 Mbits/sec 0 56.3 KBytes After: # iperf3 -c 192.168.10.1 -i 1 -t 10 Connecting to host 192.168.10.1, port 5201 [ 5] local 192.168.10.2 port 36460 connected to 192.168.10.1 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 7.39 MBytes 62.0 Mbits/sec 3 130 KBytes [ 5] 1.00-2.00 sec 5.06 MBytes 42.4 Mbits/sec 16 113 KBytes [ 5] 2.00-3.00 sec 8.58 MBytes 72.0 Mbits/sec 42 94.3 KBytes [ 5] 3.00-4.00 sec 7.44 MBytes 62.4 Mbits/sec 2 131 KBytes [ 5] 4.00-5.07 sec 8.13 MBytes 63.5 Mbits/sec 38 92.9 KBytes Signed-off-by: Juhee Kang <claudiajkang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-05skmsg: convert struct sk_msg_sg::copy to a bitmapEric Dumazet2-8/+7
We have plans for increasing MAX_SKB_FRAGS, but sk_msg_sg::copy is currently an unsigned long, limiting MAX_SKB_FRAGS to 30 on 32bit arches. Convert it to a bitmap, as Jakub suggested. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-05net: typhoon: implement ndo_features_check methodEric Dumazet1-5/+16
Instead of disabling TSO at compile time if MAX_SKB_FRAGS > 32, implement ndo_features_check() method for this driver for a more dynamic handling. If skb has more than 32 frags and is a GSO packet, force software segmentation. Most locally generated packets will use a small number of fragments anyway. For forwarding workloads, we can limit gro_max_size at ingress, we might also implement gro_max_segs if needed. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-05net: sundance: Replace one-element array with non-array objectGustavo A. R. Silva1-30/+30
It seems this one-element array is not actually being used as an array of variable size, so we can just replace it with just a non-array object of type struct desc_frag and refactor a bit the rest of the code. This helps with the ongoing efforts to globally enable -Warray-bounds and get us closer to being able to tighten the FORTIFY_SOURCE routines on memcpy(). This issue was found with the help of Coccinelle and audited and fixed, manually. [1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://www.kernel.org/doc/html/v5.16/process/deprecated.html#zero-length-and-one-element-arrays Link: https://github.com/KSPP/linux/issues/79 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-05bnx2x: Replace one-element array with flexible-array memberGustavo A. R. Silva1-1/+1
There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use “flexible array members”[1] for these cases. The older style of one-element or zero-length arrays should no longer be used[2]. This helps with the ongoing efforts to globally enable -Warray-bounds and get us closer to being able to tighten the FORTIFY_SOURCE routines on memcpy(). This issue was found with the help of Coccinelle and audited and fixed, manually. [1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://www.kernel.org/doc/html/v5.16/process/deprecated.html#zero-length-and-one-element-arrays Link: https://github.com/KSPP/linux/issues/79 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-05Merge branch 'net-mana-next'David S. Miller1-5/+5
Haiyang Zhang says: ==================== net: mana: Add handling of CQE_RX_TRUNCATED and a cleanup Add handling of CQE_RX_TRUNCATED and a cleanup patch ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-05net: mana: Remove unnecessary check of cqe_type in mana_process_rx_cqe()Haiyang Zhang1-3/+0
The switch statement already ensures cqe_type == CQE_RX_OKAY at that point. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-05net: mana: Add handling of CQE_RX_TRUNCATEDHaiyang Zhang1-2/+5
The proper way to drop this kind of CQE is advancing rxq tail without indicating the packet to the upper network layer. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-05Merge branch 'net-dev-tracking-improvements'David S. Miller5-28/+70
Eric Dumazet says: ==================== net: device tracking improvements Main goal of this series is to be able to detect the following case which apparently is still haunting us. dev_hold_track(dev, tracker_1, GFP_ATOMIC); dev_hold(dev); dev_put(dev); dev_put(dev); // Should complain loudly here. dev_put_track(dev, tracker_1); // instead of here (as before this series) v2: third patch: I replaced the dev_put() in linkwatch_do_dev() with __dev_put(). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-05net: refine dev_put()/dev_hold() debuggingEric Dumazet3-27/+50
We are still chasing some syzbot reports where we think a rogue dev_put() is called with no corresponding prior dev_hold(). Unfortunately it eats a reference on dev->dev_refcnt taken by innocent dev_hold_track(), meaning that the refcount saturation splat comes too late to be useful. Make sure that 'not tracked' dev_put() and dev_hold() better use CONFIG_NET_DEV_REFCNT_TRACKER=y debug infrastructure: Prior patch in the series allowed ref_tracker_alloc() and ref_tracker_free() to be called with a NULL @trackerp parameter, and to use a separate refcount only to detect too many put() even in the following case: dev_hold_track(dev, tracker_1, GFP_ATOMIC); dev_hold(dev); dev_put(dev); dev_put(dev); // Should complain loudly here. dev_put_track(dev, tracker_1); // instead of here Add clarification about netdev_tracker_alloc() role. v2: I replaced the dev_put() in linkwatch_do_dev() with __dev_put() because callers called netdev_tracker_free(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-05ref_tracker: add a count of untracked referencesEric Dumazet2-1/+13
We are still chasing a netdev refcount imbalance, and we suspect we have one rogue dev_put() that is consuming a reference taken from a dev_hold_track() To detect this case, allow ref_tracker_alloc() and ref_tracker_free() to be called with a NULL @trackerp parameter, and use a dedicated refcount_t just for them. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-05ref_tracker: implement use-after-free detectionEric Dumazet2-0/+7
Whenever ref_tracker_dir_init() is called, mark the struct ref_tracker_dir as dead. Test the dead status from ref_tracker_alloc() and ref_tracker_free() This should detect buggy dev_put()/dev_hold() happening too late in netdevice dismantle process. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-05Merge branch 'ipv6-mc_forwarding-changes'David S. Miller5-9/+12
Eric Dumazet says: ==================== ipv6: mc_forwarding changes First patch removes minor data-races, as mc_forwarding can be locklessly read in fast path. Second patch adds a short cut in ip6mr_sk_done() ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-05ip6mr: ip6mr_sk_done() can exit early in common casesEric Dumazet1-0/+3
In many cases, ip6mr_sk_done() is called while no ipmr socket has been registered. This removes 4 rtnl acquisitions per netns dismantle, with following callers: igmp6_net_exit(), tcpv6_net_exit(), ndisc_net_exit() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-05ipv6: make mc_forwarding atomicEric Dumazet5-9/+9
This fixes minor data-races in ip6_mc_input() and batadv_mcast_mla_rtr_flags_softif_get_ipv6() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-05net: dsa: realtek: don't default Kconfigs to yJakub Kicinski1-4/+0
We generally default the vendor to y and the drivers itself to n. NET_DSA_REALTEK, however, selects a whole bunch of things, so it's not a pure "vendor selection" knob. Let's default it all to n. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Arınç ÜNAL <arinc.unal@arinc9.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-05net: sparx5: remove phylink_config.pcs_poll usageRussell King (Oracle)1-1/+0
Phylink will use PCS polling whenever phylink_config.pcs_poll or the phylink_pcs poll member is set. As this driver sets both, remove the former. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-05net: phylink: remove phylink_set_10g_modes()Russell King (Oracle)2-12/+0
phylink_set_10g_modes() is no longer used with the conversion of drivers to phylink_generic_validate(), so we can remove it. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-05Merge branch 'gro-minor-opts'David S. Miller2-36/+32
Paolo Abeni says: ==================== gro: a couple of minor optimization This series collects a couple of small optimizations for the GRO engine, reducing slightly the number of cycles for dev_gro_receive(). The delta is within noise range in tput tests, but with big TCP coming every cycle saved from the GRO engine will count - I hope ;) v1 -> v2: - a few cleanup suggested from Alexander(s) - moved away the more controversial 3rd patch ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-05net: gro: minor optimization for dev_gro_receive()Paolo Abeni2-35/+32
While inspecting some perf report, I noticed that the compiler emits suboptimal code for the napi CB initialization, fetching and storing multiple times the memory for flags bitfield. This is with gcc 10.3.1, but I observed the same with older compiler versions. We can help the compiler to do a nicer work clearing several fields at once using an u32 alias. The generated code is quite smaller, with the same number of conditional. Before: objdump -t net/core/gro.o | grep " F .text" 0000000000000bb0 l F .text 0000000000000357 dev_gro_receive After: 0000000000000bb0 l F .text 000000000000033c dev_gro_receive v1 -> v2: - use struct_group (Alexander and Alex) RFC -> v1: - use __struct_group to delimit the zeroed area (Alexander) Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-05net: gro: avoid re-computing truesize twice on recyclePaolo Abeni1-1/+0
After commit 5e10da5385d2 ("skbuff: allow 'slow_gro' for skb carring sock reference") and commit af352460b465 ("net: fix GRO skb truesize update") the truesize of the skb with stolen head is properly updated by the GRO engine, we don't need anymore resetting it at recycle time. v1 -> v2: - clarify the commit message (Alexander) Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-05net: dsa: qca8k: check correct variable in qca8k_phy_eth_command()Dan Carpenter1-1/+1
This is a copy and paste bug. It was supposed to check "clear_skb" instead of "write_skb". Fixes: 2cd548566384 ("net: dsa: qca8k: add support for phy read/write with mgmt Ethernet") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-05Merge branch 'lan966x-mcast-snooping'David S. Miller5-2/+166
Horatiu Vultur says: ==================== net: lan966x: add support for mcast snooping Implement the switchdev callback SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED to allow to enable/disable multicast snooping. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-05net: lan966x: Update mdb when enabling/disabling mcast_snoopingHoratiu Vultur3-0/+51
When the multicast snooping is disabled, the mdb entries should be removed from the HW, but they still need to be kept in memory for when the mcast_snooping will be enabled again. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-05net: lan966x: Implement the callback SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLEDHoratiu Vultur4-1/+110
The callback allows to enable/disable multicast snooping. When the snooping is enabled, all IGMP and MLD frames are redirected to the CPU, therefore make sure not to set the skb flag 'offload_fwd_mark'. The HW will not flood multicast ipv4/ipv6 data frames. When the snooping is disabled, the HW will flood IGMP, MLD and multicast ipv4/ipv6 frames according to the mcast_flood flag. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-05net: lan966x: Update the PGID used by IPV6 data framesHoratiu Vultur1-1/+5
When enabling the multicast snooping, the forwarding of the IPV6 frames has it's own forwarding mask. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-05net/sched: Enable tc skb ext allocation on chain miss only when neededPaul Blakey5-23/+56
Currently tc skb extension is used to send miss info from tc to ovs datapath module, and driver to tc. For the tc to ovs miss it is currently always allocated even if it will not be used by ovs datapath (as it depends on a requested feature). Export the static key which is used by openvswitch module to guard this code path as well, so it will be skipped if ovs datapath doesn't need it. Enable this code path once ovs datapath needs it. Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-05Merge branch 'mptcp-improve-set-flags-command-and-update-self-tests'Jakub Kicinski4-388/+595
Mat Martineau says: ==================== mptcp: Improve set-flags command and update self tests Patches 1-3 allow more flexibility in the combinations of features and flags allowed with the MPTCP_PM_CMD_SET_FLAGS netlink command, and add self test case coverage for the new functionality. Patches 4-6 and 9 refactor the mptcp_join.sh self tests to allow them to configure all of the test cases using either the pm_nl_ctl utility (part of the mptcp self tests) or the 'ip mptcp' command (from iproute2). The default remains to use pm_nl_ctl. Patches 7 and 8 update the pm_netlink.sh self tests to cover the use of endpoint ids to set endpoint flags (instead of just addresses). ==================== Link: https://lore.kernel.org/r/20220205000337.187292-1-mathew.j.martineau@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-05selftests: mptcp: set ip_mptcp in command lineGeliang Tang1-3/+9
This patch added a command line option '-i' for mptcp_join.sh to use 'ip mptcp' commands instead of using 'pm_nl_ctl' commands to deal with PM netlink. Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>