starfive-tech/linux.git - StarFive Tech Linux Kernel for VisionFive (JH7110) boards (mirror)

Age	Commit message (Collapse)	Author	Files	Lines
2021-08-17	octeontx2-pf: Sort the allocated MCAM entry indices	Sunil Goutham	1	-0/+15
	Per single mailbox request a maximum of 256 MCAM entries can be allocated. If more than 256 are being allocated, then the mcam indices in the final list could get jumbled. Hence sort the indices. Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17	octeontx2-pf: Ntuple filters support for VF netdev	Rakesh Babu	5	-60/+98
	Add packet flow classification support for both LMAC mapped virtual functions and loopback VFs. This patch adds supports for ntuple offload feature. Signed-off-by: Rakesh Babu <rsaladi2@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17	octeontx2-pf: Enable NETIF_F_RXALL support for VF driver	Sunil Goutham	2	-4/+4
	Enabled NETIF_F_RXALL support for VF driver. Also removed MTU range comments which are no longer valid. Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17	octeontx2-af: Add debug messages for failures	Sunil Goutham	1	-19/+73
	Added debug messages for various failures during probe. This will help in quickly identifying the API where the failure is happening. Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17	octeontx2-af: add proper return codes for AF mailbox handlers	Naveen Mamindlapalli	4	-21/+47
	Add appropriate error codes to be used when returning from AF mailbox handlers due to some error condition. Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17	octeontx2-af: Modify install flow error codes	Subbaraya Sundeep	2	-8/+15
	When installing a flow using npc_install_flow mailbox there are number of reasons to reject the request like caller is not permitted, invalid channel specified in request, flow not supported in extraction profile and so on. Hence define new error codes for npc flows and use them instead of generic error codes. Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17	Merge tag 'mlx5-updates-2021-08-16' of ↵	David S. Miller	19	-716/+1696
	git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2021-08-16 The following patchset provides two separate mlx5 updates 1) Ethtool RSS context and MQPRIO channel mode support: 1.1) enable mlx5e netdev driver to allow creating Transport Interface RX (TIRs) objects on the fly to be used for ethtool RSS contexts and TX MQPRIO channel mode 1.2) Introduce mlx5e_rss object to manage such TIRs. 1.3) Ethtool support for RSS context 1.4) Support MQPRIO channel mode 2) Bridge offloads Lag support: to allow adding bond net devices to mlx5 bridge 2.1) Address bridge port by (vport_num, esw_owner_vhca_id) pair since vport_num is only unique per eswitch and in lag mode we need to manage ports from both eswitches. 2.2) Allow connectivity between representors of different eswitch instances that are attached to same bridge 2.3) Bridge LAG, Require representors to be in shared FDB mode and introduce local and peer ports representors, match on paired eswitch metadata in peer FDB entries, And finally support addition/deletion and aging of peer flows. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17	vrf: Reset skb conntrack connection on VRF rcv	Lahav Schlesinger	1	-0/+4
	To fix the "reverse-NAT" for replies. When a packet is sent over a VRF, the POST_ROUTING hooks are called twice: Once from the VRF interface, and once from the "actual" interface the packet will be sent from: 1) First SNAT: l3mdev_l3_out() -> vrf_l3_out() -> .. -> vrf_output_direct() This causes the POST_ROUTING hooks to run. 2) Second SNAT: 'ip_output()' calls POST_ROUTING hooks again. Similarly for replies, first ip_rcv() calls PRE_ROUTING hooks, and second vrf_l3_rcv() calls them again. As an example, consider the following SNAT rule: > iptables -t nat -A POSTROUTING -p udp -m udp --dport 53 -j SNAT --to-source 2.2.2.2 -o vrf_1 In this case sending over a VRF will create 2 conntrack entries. The first is from the VRF interface, which performs the IP SNAT. The second will run the SNAT, but since the "expected reply" will remain the same, conntrack randomizes the source port of the packet: e..g With a socket bound to 1.1.1.1:10000, sending to 3.3.3.3:53, the conntrack rules are: udp 17 29 src=2.2.2.2 dst=3.3.3.3 sport=10000 dport=53 packets=1 bytes=68 [UNREPLIED] src=3.3.3.3 dst=2.2.2.2 sport=53 dport=61033 packets=0 bytes=0 mark=0 use=1 udp 17 29 src=1.1.1.1 dst=3.3.3.3 sport=10000 dport=53 packets=1 bytes=68 [UNREPLIED] src=3.3.3.3 dst=2.2.2.2 sport=53 dport=10000 packets=0 bytes=0 mark=0 use=1 i.e. First SNAT IP from 1.1.1.1 --> 2.2.2.2, and second the src port is SNAT-ed from 10000 --> 61033. But when a reply is sent (3.3.3.3:53 -> 2.2.2.2:61033) only the later conntrack entry is matched: udp 17 29 src=2.2.2.2 dst=3.3.3.3 sport=10000 dport=53 packets=1 bytes=68 src=3.3.3.3 dst=2.2.2.2 sport=53 dport=61033 packets=1 bytes=49 mark=0 use=1 udp 17 28 src=1.1.1.1 dst=3.3.3.3 sport=10000 dport=53 packets=1 bytes=68 [UNREPLIED] src=3.3.3.3 dst=2.2.2.2 sport=53 dport=10000 packets=0 bytes=0 mark=0 use=1 And a "port 61033 unreachable" ICMP packet is sent back. The issue is that when PRE_ROUTING hooks are called from vrf_l3_rcv(), the skb already has a conntrack flow attached to it, which means nf_conntrack_in() will not resolve the flow again. This means only the dest port is "reverse-NATed" (61033 -> 10000) but the dest IP remains 2.2.2.2, and since the socket is bound to 1.1.1.1 it's not received. This can be verified by logging the 4-tuple of the packet in '__udp4_lib_rcv()'. The fix is then to reset the flow when skb is received on a VRF, to let conntrack resolve the flow again (which now will hit the earlier flow). To reproduce: (Without the fix "Got pkt_to_nat_port" will not be printed by running 'bash ./repro'): $ cat run_in_A1.py import logging logging.getLogger("scapy.runtime").setLevel(logging.ERROR) from scapy.all import * import argparse def get_packet_to_send(udp_dst_port, msg_name): return Ether(src='11:22:33:44:55:66', dst=iface_mac)/ \ IP(src='3.3.3.3', dst='2.2.2.2')/ \ UDP(sport=53, dport=udp_dst_port)/ \ Raw(f'{msg_name}\x0012345678901234567890') parser = argparse.ArgumentParser() parser.add_argument('-iface_mac', dest="iface_mac", type=str, required=True, help="From run_in_A3.py") parser.add_argument('-socket_port', dest="socket_port", type=str, required=True, help="From run_in_A3.py") parser.add_argument('-v1_mac', dest="v1_mac", type=str, required=True, help="From script") args, _ = parser.parse_known_args() iface_mac = args.iface_mac socket_port = int(args.socket_port) v1_mac = args.v1_mac print(f'Source port before NAT: {socket_port}') while True: pkts = sniff(iface='_v0', store=True, count=1, timeout=10) if 0 == len(pkts): print('Something failed, rerun the script :(', flush=True) break pkt = pkts[0] if not pkt.haslayer('UDP'): continue pkt_sport = pkt.getlayer('UDP').sport print(f'Source port after NAT: {pkt_sport}', flush=True) pkt_to_send = get_packet_to_send(pkt_sport, 'pkt_to_nat_port') sendp(pkt_to_send, '_v0', verbose=False) # Will not be received pkt_to_send = get_packet_to_send(socket_port, 'pkt_to_socket_port') sendp(pkt_to_send, '_v0', verbose=False) break $ cat run_in_A2.py import socket import netifaces print(f"{netifaces.ifaddresses('e00000')[netifaces.AF_LINK][0]['addr']}", flush=True) s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) s.setsockopt(socket.SOL_SOCKET, socket.SO_BINDTODEVICE, str('vrf_1' + '\0').encode('utf-8')) s.connect(('3.3.3.3', 53)) print(f'{s. getsockname()[1]}', flush=True) s.settimeout(5) while True: try: # Periodically send in order to keep the conntrack entry alive. s.send(b'a'*40) resp = s.recvfrom(1024) msg_name = resp[0].decode('utf-8').split('\0')[0] print(f"Got {msg_name}", flush=True) except Exception as e: pass $ cat repro.sh ip netns del A1 2> /dev/null ip netns del A2 2> /dev/null ip netns add A1 ip netns add A2 ip -n A1 link add _v0 type veth peer name _v1 netns A2 ip -n A1 link set _v0 up ip -n A2 link add e00000 type bond ip -n A2 link add lo0 type dummy ip -n A2 link add vrf_1 type vrf table 10001 ip -n A2 link set vrf_1 up ip -n A2 link set e00000 master vrf_1 ip -n A2 addr add 1.1.1.1/24 dev e00000 ip -n A2 link set e00000 up ip -n A2 link set _v1 master e00000 ip -n A2 link set _v1 up ip -n A2 link set lo0 up ip -n A2 addr add 2.2.2.2/32 dev lo0 ip -n A2 neigh add 1.1.1.10 lladdr 77:77:77:77:77:77 dev e00000 ip -n A2 route add 3.3.3.3/32 via 1.1.1.10 dev e00000 table 10001 ip netns exec A2 iptables -t nat -A POSTROUTING -p udp -m udp --dport 53 -j \ SNAT --to-source 2.2.2.2 -o vrf_1 sleep 5 ip netns exec A2 python3 run_in_A2.py > x & XPID=$! sleep 5 IFACE_MAC=`sed -n 1p x` SOCKET_PORT=`sed -n 2p x` V1_MAC=`ip -n A2 link show _v1 \| sed -n 2p \| awk '{print $2'}` ip netns exec A1 python3 run_in_A1.py -iface_mac ${IFACE_MAC} -socket_port \ ${SOCKET_PORT} -v1_mac ${SOCKET_PORT} sleep 5 kill -9 $XPID wait $XPID 2> /dev/null ip netns del A1 ip netns del A2 tail x -n 2 rm x set +x Fixes: 73e20b761acf ("net: vrf: Add support for PREROUTING rules on vrf device") Signed-off-by: Lahav Schlesinger <lschlesinger@drivenets.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20210815120002.2787653-1-lschlesinger@drivenets.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-17	net/mlx5: Bridge, support LAG	Vlad Buslov	3	-48/+159
	Allow adding bond net devices to mlx5 bridge with following changes: - Modify bridge representor code to obtain uplink represetor that belongs to eswitch that is registered for notification. Require representor to be in shared FDB mode. If representor is the lag master, then consider its port as local, otherwise treat it as peer. - Use devcom to match on paired eswitch metadata in peer FDB entries. This is necessary for shared FDB LAG to function since packets are always received on active eswitch instance as opposed to parent eswitch of port. - Support for deleting peer flows when receiving SWITCHDEV_FDB_DEL_TO_BRIDGE notification was implemented in one of previous patches in series. Now also implement support for handling SWITCHDEV_FDB_ADD_TO_BRIDGE which can be generated on peer by bridge update workqueue task in LAG configuration. Refresh the flow 'lastuse' timestamp to current jiffies when receiving such notification on eswitch that manages the local FDB entry. This allows peer entries to prevent ageing of the FDB. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-17	net/mlx5: Bridge, allow merged eswitch connectivity	Vlad Buslov	5	-28/+112
	Allow connectivity between representors of different eswitch instances that are attached to same bridge when merged_eswitch capability is enabled. Add ports of peer eswitch to bridge instance and mark them with MLX5_ESW_BRIDGE_PORT_FLAG_PEER. Mark FDBs offloaded on peer ports with MLX5_ESW_BRIDGE_FLAG_PEER flag. Such FDBs can only be aged out on their local eswitch instance, which then sends SWITCHDEV_FDB_DEL_TO_BRIDGE event. Listen to the event on mlx5 bridge implementation and delete peer FDBs in event handler. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-17	net/mlx5: Bridge, extract FDB delete notification to function	Vlad Buslov	1	-14/+13
	SWITCHDEV_FDB_DEL_TO_BRIDGE notification is generated in multiple places in bridge code. Following patch in series changes the condition for the notification. Extract the notification into dedicated helper function mlx5_esw_bridge_fdb_del_notify() to only modify it in single place in the future changes. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-17	net/mlx5: Bridge, identify port by vport_num+esw_owner_vhca_id pair	Vlad Buslov	6	-208/+263
	Following patches in series allow traffic between vports of different eswitch instances, which requires addressing bridge port by vport_num+esw_owner_vhca_id pair since vport_num is only unique per-eswitch. As a preparation, extend struct mlx5_esw_bridge_port with 'esw_owner_vhca_id' field and use it as part of key for mlx5_esw_bridge->vports xarray. With this change we can't rely on switchdev_handle_port_obj_add() helper to get mlx5 representor from stacked device because we need specifically representor from parent eswitch that registered the callback to obtain correct esw_owner_vhca_id. The helper doesn't allow passing additional parameters to predicate function and doesn't provide access to the notifier block to obtain eswitch through br_offloads. Implement custom helpers to obtain mlx5 representor and use them in mlx5_esw_bridge_port_obj_{add\|del\|attr_set}() implementations. Remove direct pointer to parent bridge from struct mlx5_vport as it is no longer needed. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-17	net/mlx5: Bridge, obtain core device from eswitch instead of priv	Vlad Buslov	1	-4/+2
	Following patches in series will pass bond device to bridge, which means the code can't assume the device is mlx5 representor. Moreover, the core device can be easily obtained from eswitch instance, so there is no reason for more complex code that obtains struct mlx5_priv from net_device in order to use its mdev. Refactor the code to use esw->dev instead of priv->mdev. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-17	net/mlx5: Bridge, release bridge in same function where it is taken	Vlad Buslov	1	-7/+9
	Refactor mlx5_esw_bridge_vport_link() to release the bridge instance if mlx5_esw_bridge_vport_init() returned an error instead of relying on it to release the bridge. This improves the design because object instance is taken and released in same layer and simplifies following patches that add more logic to mlx5_esw_bridge_vport_link(). Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-17	net/mlx5e: Support MQPRIO channel mode	Tariq Toukan	3	-8/+102
	Add support for MQPRIO channel mode, in which a partition to TCs is defined over the channels. We allow partitions with contiguous queue indices, with no holes within. We do not allow modification to the num of channels while this MQPRIO mode is active. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-17	net/mlx5e: Handle errors of netdev_set_num_tc()	Tariq Toukan	1	-6/+14
	Add handling for failures in netdev_set_num_tc(). Let mlx5e_netdev_set_tcs return an int. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-17	net/mlx5e: Maintain MQPRIO mode parameter	Tariq Toukan	2	-17/+28
	This is in preparation for supporting MQPRIO CHANNEL mode in downstream patch, in addition to DCB mode that's supported today. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-17	net/mlx5e: Abstract MQPRIO params	Tariq Toukan	6	-25/+37
	Abstract the MQPRIO params into a struct. Use a getter for DCB mode num_tcs. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-17	net/mlx5e: Support flow classification into RSS contexts	Tariq Toukan	5	-21/+131
	Extend the existing flow classification support, to steer flows not only directly to a receive ring, but also into the new RSS contexts. Create needed TIR objects on demand, and hold reference on the RSS context. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-17	net/mlx5e: Support multiple RSS contexts	Tariq Toukan	5	-51/+273
	Add support to multiple RSS contexts. Resources of the non-default RSS contexts are allocated and created on demand. Each RSS context can be controlled and configured separately, via the implemented ethtool ops. Here we limit the num of total contexts to 16. We do not enforce any kind of new limitation over the indirection table content. More specifically, two separate contexts can be configured to fully or partially point to the same set of receive rings. The default RSS context (index 0) is created with its full set of TIRs. All other contexts are created with an empty set, then TIRs are added upon first usage when steering rules are added. We use a reference counting mechanism to make sure an RSS context is not removed before the rules pointing to it. Block ethtool set_channels operations when multiple RSS contexts exist, as currently the kernel doesn't protect against inconsistent channels configs that break non-default RSS contexts. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-17	net/mlx5e: Dynamically allocate TIRs in RSS contexts	Tariq Toukan	1	-13/+56
	Move from static to dynamic memory allocations for TIR. This is in preparation to supporting on-demand TIR operations in downstream patches, where every RSS context will be init with an empty set of TIRs. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-17	net/mlx5e: Convert RSS to a dedicated object	Tariq Toukan	5	-428/+604
	Code related to RSS is now encapsulated into a dedicated object and put into new files en/rss.{c,h}. All usages are converted. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-17	net/mlx5e: Introduce abstraction of RSS context	Tariq Toukan	3	-73/+105
	Bring all fields that define and maintain RSS behavior together into a new structure. Align all usages with this new structure. Keep it hidden within rx_res.c. This helps supporting multiple RSS contexts in downstream patch. Use dynamic allocations for the RSS context. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-17	net/mlx5e: Introduce TIR create/destroy API in rx_res	Tariq Toukan	1	-57/+83
	Take TIR control operations in rx_res into functions. This is in preparation to supporting on-demand TIR operations in downstream patches. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-17	net/mlx5e: Do not try enable RSS when resetting indir table	Tariq Toukan	1	-5/+2
	All calls to mlx5e_rx_res_rss_set_indir_uniform() occur while the RSS state is inactive, i.e. the RQT is pointing to the drop RQ, not to the channels' RQs. It means that the "apply" part of the function is not called. Remove this part from the function, and document the change. It will be useful for next patches in the series, allows code simplifications when multiple RSS contexts are introduced. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-17	bpf: Refactor BPF_PROG_RUN into a function	Andrii Nakryiko	2	-5/+5
	Turn BPF_PROG_RUN into a proper always inlined function. No functional and performance changes are intended, but it makes it much easier to understand what's going on with how BPF programs are actually get executed. It's more obvious what types and callbacks are expected. Also extra () around input parameters can be dropped, as well as `__` variable prefixes intended to avoid naming collisions, which makes the code simpler to read and write. This refactoring also highlighted one extra issue. BPF_PROG_RUN is both a macro and an enum value (BPF_PROG_RUN == BPF_PROG_TEST_RUN). Turning BPF_PROG_RUN into a function causes naming conflict compilation error. So rename BPF_PROG_RUN into lower-case bpf_prog_run(), similar to bpf_prog_run_xdp(), bpf_prog_run_pin_on_cpu(), etc. All existing callers of BPF_PROG_RUN, the macro, are switched to bpf_prog_run() explicitly. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210815070609.987780-2-andrii@kernel.org
2021-08-17	net: hns3: add support ethtool extended link state	Guangbin Huang	5	-0/+101
	In order to know the reason of link up failure, add supporting ethtool extended link state. Driver reads the link status code from firmware if in link down state and converts it to ethtool extended link state. Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-17	net: hns3: add header file hns3_ethtoo.h	Guangbin Huang	2	-15/+26
	Add a new file hns3_ethtool.h, and move struct type definitions from hns3_ethtool.c to hns3_ethtool.h. Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-16	bonding: improve nl error msg when device can't be enslaved because of ↵	Antoine Tenart	1	-1/+1
	IFF_MASTER Use a more user friendly netlink error message when a device can't be enslaved because it has IFF_MASTER, by not referring directly to a kernel internal flag. Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-16	net: iosm: Prevent underflow in ipc_chnl_cfg_get()	Dan Carpenter	1	-4/+3
	The bounds check on "index" doesn't catch negative values. Using ARRAY_SIZE() directly is more readable and more robust because it prevents negative values for "index". Fortunately we only pass valid values to ipc_chnl_cfg_get() so this patch does not affect runtime. Reported-by: Solomon Ucko <solly.ucko@gmail.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: M Chetan Kumar <m.chetan.kumar@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-16	net: stmmac: add ethtool per-queue irq statistic support	Vijayakannan Ayyathurai	3	-0/+6
	Adding ethtool per-queue statistics support to show number of interrupts generated at DMA tx and DMA rx. All the counters are incremented at dwmac4_dma_interrupt function. Signed-off-by: Vijayakannan Ayyathurai <vijayakannan.ayyathurai@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-16	net: stmmac: add ethtool per-queue statistic framework	Vijayakannan Ayyathurai	3	-1/+80
	Adding generic ethtool per-queue statistic framework to display the statistics for each rx/tx queue. In future, users can avail it to add more per-queue specific counters. Number of rx/tx queues displayed is depending on the available rx/tx queues in that particular MAC config and this number is limited up to the MTL_MAX_{RX\|TX}_QUEUES defined in the driver. Ethtool per-queue statistic display will look like below, when users start adding more counters. Example: q0_tx_statA: q0_tx_statB: q0_tx_statC: \| q0_tx_statX: . . . qMAX_tx_statA: qMAX_tx_statB: qMAX_tx_statC: \| qMAX_tx_statX: q0_rx_statA: q0_rx_statB: q0_rx_statC: \| q0_rx_statX: . . . qMAX_rx_statA: qMAX_rx_statB: qMAX_rx_statC: \| qMAX_rx_statX: In addition, this patch has the support on displaying the number of packets received and transmitted per queue. Signed-off-by: Vijayakannan Ayyathurai <vijayakannan.ayyathurai@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-16	net: stmmac: fix INTR TBU status affecting irq count statistic	Voon Weifeng	1	-2/+3
	DMA channel status "Transmit buffer unavailable(TBU)" bit is not considered as a successful dma tx. Hence, it should not affect all the irq count statistic. Fixes: 1103d3a5531c ("net: stmmac: dwmac4: Also use TBU interrupt to clean TX path") Signed-off-by: Voon Weifeng <weifeng.voon@intel.com> Signed-off-by: Vijayakannan Ayyathurai <vijayakannan.ayyathurai@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-16	bnxt_en: Add missing DMA memory barriers	Michael Chan	1	-0/+12
	Each completion ring entry has a valid bit to indicate that the entry contains a valid completion event. The driver's main poll loop __bnxt_poll_work() has the proper dma_rmb() to make sure the valid bit of the next entry has been checked before proceeding further. But when we call bnxt_rx_pkt() to process the RX event, the RX completion event consists of two completion entries and only the first entry has been checked to be valid. We need the same barrier after checking the next completion entry. Add missing dma_rmb() barriers in bnxt_rx_pkt() and other similar locations. Fixes: 67a95e2022c7 ("bnxt_en: Need memory barrier when processing the completion ring.") Reported-by: Lance Richardson <lance.richardson@broadcom.com> Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Reviewed-by: Lance Richardson <lance.richardson@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-16	bnxt_en: Disable aRFS if running on 212 firmware	Michael Chan	1	-0/+3
	212 firmware broke aRFS, so disable it. Traffic may stop after ntuple filters are inserted and deleted by the 212 firmware. Fixes: ae10ae740ad2 ("bnxt_en: Add new hardware RFS mode.") Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-16	net: dsa: sja1105: reorganize probe, remove, setup and teardown ordering	Vladimir Oltean	1	-198/+199
	The sja1105 driver's initialization and teardown sequence is a chaotic mess that has gathered a lot of cruft over time. It works because there is no strict dependency between the functions, but it could be improved. The basic principle that teardown should be the exact reverse of setup is obviously not held. We have initialization steps (sja1105_tas_setup, sja1105_flower_setup) in the probe method that are torn down in the DSA .teardown method instead of driver unbind time. We also have code after the dsa_register_switch() call, which implicitly means after the .setup() method has finished, which is pretty unusual. Also, sja1105_teardown() has calls set up in a different order than the error path of sja1105_setup(): see the reversed ordering between sja1105_ptp_clock_unregister and sja1105_mdiobus_unregister. Also, sja1105_static_config_load() is called towards the end of sja1105_setup(), but sja1105_static_config_free() is also towards the end of the error path and teardown path. The static_config_load() call should be earlier. Also, making and breaking the connections between struct sja1105_port and struct dsa_port could be refactored into dedicated functions, makes the code easier to follow. We move some code from the DSA .setup() method into the probe method, like the device tree parsing, and we move some code from the probe method into the DSA .setup() method to be symmetric with its placement in the DSA .teardown() method, which is nice because the unbind function has a single call to dsa_unregister_switch(). Example of the latter type of code movement are the connections between ports mentioned above, they are now in the .setup() method. Finally, due to fact that the kthread_init_worker() call is no longer in sja1105_probe() - located towards the bottom of the file - but in sja1105_setup() - located much higher - there is an inverse ordering with the worker function declaration, sja1105_port_deferred_xmit. To avoid that, the entire sja1105_setup() and sja1105_teardown() functions are moved towards the bottom of the file. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-16	qed: Fix null-pointer dereference in qed_rdma_create_qp()	Shai Malin	1	-2/+1
	Fix a possible null-pointer dereference in qed_rdma_create_qp(). Changes from V2: - Revert checkpatch fixes. Reported-by: TOTE Robot <oslab@tsinghua.edu.cn> Signed-off-by: Ariel Elior <aelior@marvell.com> Signed-off-by: Shai Malin <smalin@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-16	qed: qed ll2 race condition fixes	Shai Malin	1	-0/+20
	Avoiding qed ll2 race condition and NULL pointer dereference as part of the remove and recovery flows. Changes form V1: - Change (!p_rx->set_prod_addr). - qed_ll2.c checkpatch fixes. Change from V2: - Revert "qed_ll2.c checkpatch fixes". Signed-off-by: Ariel Elior <aelior@marvell.com> Signed-off-by: Shai Malin <smalin@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-16	r8169: rename rtl_csi_access_enable to rtl_set_aspm_entry_latency	Heiner Kallweit	1	-4/+7
	Rename the function to reflect what it's doing. Also add a description of the register values as kindly provided by Realtek. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-16	net: mscc: ocelot: convert to phylink	Vladimir Oltean	7	-250/+314
	The felix DSA driver, which is a wrapper over the same hardware class as ocelot, is integrated with phylink, but ocelot is using the plain PHY library. It makes sense to bring together the two implementations, which is what this patch achieves. This is a large patch and hard to break up, but it does the following: The existing ocelot_adjust_link writes some registers, and felix_phylink_mac_link_up writes some registers, some of them are common, but both functions write to some registers to which the other doesn't. The main reasons for this are: - Felix switches so far have used an NXP PCS so they had no need to write the PCS1G registers that ocelot_adjust_link writes - Felix switches have the MAC fixed at 1G, so some of the MAC speed changes actually break the link and must be avoided. The naming conventions for the functions introduced in this patch are: - vsc7514_phylink_{mac_config,validate} are specific to the Ocelot instantiations and placed in ocelot_net.c which is built only for the ocelot switchdev driver. - ocelot_phylink_mac_link_{up,down} are shared between the ocelot switchdev driver and the felix DSA driver (they are put in the common lib). One by one, the registers written by ocelot_adjust_link are: DEV_MAC_MODE_CFG - felix_phylink_mac_link_up had no need to write this register since its out-of-reset value was fine and did not need changing. The write is moved to the common ocelot_phylink_mac_link_up and on felix it is guarded by a quirk bit that makes the written value identical with the out-of-reset one DEV_PORT_MISC - runtime invariant, was moved to vsc7514_phylink_mac_config PCS1G_MODE_CFG - same as above PCS1G_SD_CFG - same as above PCS1G_CFG - same as above PCS1G_ANEG_CFG - same as above PCS1G_LB_CFG - same as above DEV_MAC_ENA_CFG - both ocelot_adjust_link and ocelot_port_disable touched this. felix_phylink_mac_link_{up,down} also do. We go with what felix does and put it in ocelot_phylink_mac_link_up. DEV_CLOCK_CFG - ocelot_adjust_link and felix_phylink_mac_link_up both write this, but to different values. Move to the common ocelot_phylink_mac_link_up and make sure via the quirk that the old values are preserved for both. ANA_PFC_PFC_CFG - ocelot_adjust_link wrote this, felix_phylink_mac_link_up did not. Runtime invariant, speed does not matter since PFC is disabled via the RX_PFC_ENA bits which are cleared. Move to vsc7514_phylink_mac_config. QSYS_SWITCH_PORT_MODE_PORT_ENA - both ocelot_adjust_link and felix_phylink_mac_link_{up,down} wrote this. Ocelot also wrote this register from ocelot_port_disable. Keep what felix did, move in ocelot_phylink_mac_link_{up,down} and delete ocelot_port_disable. ANA_POL_FLOWC - same as above SYS_MAC_FC_CFG - same as above, except slight behavior change. Whereas ocelot always enabled RX and TX flow control, felix listened to phylink (for the most part, at least - see the 2500base-X comment). The registers which only felix_phylink_mac_link_up wrote are: SYS_PAUSE_CFG_PAUSE_ENA - this is why I am not sure that flow control worked on ocelot. Not it should, since the code is shared with felix where it does. ANA_PORT_PORT_CFG - this is a Frame Analyzer block register, phylink should be the one touching them, deleted. Other changes: - The old phylib registration code was in mscc_ocelot_init_ports. It is hard to work with 2 levels of indentation already in, and with hard to follow teardown logic. The new phylink registration code was moved inside ocelot_probe_port(), right between alloc_etherdev() and register_netdev(). It could not be done before (=> outside of) ocelot_probe_port() because ocelot_probe_port() allocates the struct ocelot_port which we then use to assign ocelot_port->phy_mode to. It is more preferable to me to have all PHY handling logic inside the same function. - On the same topic: struct ocelot_port_private :: serdes is only used in ocelot_port_open to set the SERDES protocol to Ethernet. This is logically a runtime invariant and can be done just once, when the port registers with phylink. We therefore don't even need to keep the serdes reference inside struct ocelot_port_private, or to use the devm variant of of_phy_get(). - Phylink needs a valid phy-mode for phylink_create() to succeed, and the existing device tree bindings in arch/mips/boot/dts/mscc/ocelot_pcb120.dts don't define one for the internal PHY ports. So we patch PHY_INTERFACE_MODE_NA into PHY_INTERFACE_MODE_INTERNAL. - There was a strategically placed: switch (priv->phy_mode) { case PHY_INTERFACE_MODE_NA: continue; which made the code skip the serdes initialization for the internal PHY ports. Frankly that is not all that obvious, so now we explicitly initialize the serdes under an "if" condition and not rely on code jumps, so everything is clearer. - There was a write of OCELOT_SPEED_1000 to DEV_CLOCK_CFG for QSGMII ports. Since that is in fact the default value for the register field DEV_CLOCK_CFG_LINK_SPEED, I can only guess the intention was to clear the adjacent fields, MAC_TX_RST and MAC_RX_RST, aka take the port out of reset, which does match the comment. I don't even want to know why this code is placed there, but if there is indeed an issue that all ports that share a QSGMII lane must all be up, then this logic is already buggy, since mscc_ocelot_init_ports iterates using for_each_available_child_of_node, so nobody prevents the user from putting a 'status = "disabled";' for some QSGMII ports which would break the driver's assumption. In any case, in the eventuality that I'm right, we would have yet another issue if ocelot_phylink_mac_link_down would reset those ports and that would be forbidden, so since the ocelot_adjust_link logic did not do that (maybe for a reason), add another quirk to preserve the old logic. The ocelot driver teardown goes through all ports in one fell swoop. When initialization of one port fails, the ocelot->ports[port] pointer for that is reset to NULL, and teardown is done only for non-NULL ports, so there is no reason to do partial teardowns, let the central mscc_ocelot_release_ports() do its job. Tested bind, unbind, rebind, link up, link down, speed change on mock-up hardware (modified the driver to probe on Felix VSC9959). Also regression tested the felix DSA driver. Could not test the Ocelot specific bits (PCS1G, SERDES, device tree bindings). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-16	net: dsa: felix: stop calling ocelot_port_{enable,disable}	Vladimir Oltean	3	-36/+9
	ocelot_port_enable touches ANA_PORT_PORT_CFG, which has the following fields: - LOCKED_PORTMOVE_CPU, LEARNDROP, LEARNCPU, LEARNAUTO, RECV_ENA, all of which are written with their hardware default values, also runtime invariants. So it makes no sense to write these during every .ndo_open. - PORTID_VAL: this field has an out-of-reset value of zero for all ports and must be initialized by software. Additionally, the ocelot_setup_logical_port_ids() code path sets up different logical port IDs for the ports in a hardware LAG, and we absolutely don't want .ndo_open to interfere there and reset those values. So in fact the write from ocelot_port_enable can better be moved to ocelot_init_port, and the .ndo_open hook deleted. ocelot_port_disable touches DEV_MAC_ENA_CFG and QSYS_SWITCH_PORT_MODE_PORT_ENA, in an attempt to undo what ocelot_adjust_link did. But since .ndo_stop does not get called each time the link falls (i.e. this isn't a substitute for .phylink_mac_link_down), felix already does better at this by writing those registers already in felix_phylink_mac_link_down. So keep ocelot_port_disable (for now, until ocelot is converted to phylink too), and just delete the felix call to it, which is not necessary. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-16	net: 6pack: fix slab-out-of-bounds in decode_data	Pavel Skripkin	1	-0/+6
	Syzbot reported slab-out-of bounds write in decode_data(). The problem was in missing validation checks. Syzbot's reproducer generated malicious input, which caused decode_data() to be called a lot in sixpack_decode(). Since rx_count_cooked is only 400 bytes and noone reported before, that 400 bytes is not enough, let's just check if input is malicious and complain about buffer overrun. Fail log: ================================================================== BUG: KASAN: slab-out-of-bounds in drivers/net/hamradio/6pack.c:843 Write of size 1 at addr ffff888087c5544e by task kworker/u4:0/7 CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 5.6.0-rc3-syzkaller #0 ... Workqueue: events_unbound flush_to_ldisc Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x197/0x210 lib/dump_stack.c:118 print_address_description.constprop.0.cold+0xd4/0x30b mm/kasan/report.c:374 __kasan_report.cold+0x1b/0x32 mm/kasan/report.c:506 kasan_report+0x12/0x20 mm/kasan/common.c:641 __asan_report_store1_noabort+0x17/0x20 mm/kasan/generic_report.c:137 decode_data.part.0+0x23b/0x270 drivers/net/hamradio/6pack.c:843 decode_data drivers/net/hamradio/6pack.c:965 [inline] sixpack_decode drivers/net/hamradio/6pack.c:968 [inline] Reported-and-tested-by: syzbot+fc8cd9a673d4577fb2e4@syzkaller.appspotmail.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-16	net: phy: marvell: Add WAKE_PHY support to WOL event	Song Yoong Siang	1	-3/+36
	Add Wake-on-PHY feature support by enabling the Link Up Event. Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-16	net: pcs: xpcs: Add Pause Mode support for SGMII and 2500BaseX	Wong Vee Khee	1	-0/+4
	SGMII/2500BaseX supports Pause frame as defined in the IEEE802.3x Flow Control standardization. Add this as a supported feature under the xpcs_sgmii_features struct. Cc: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-16	MDIO: Kconfig: Specify more IPQ chipset supported	Luo Jie	2	-1/+2
	The IPQ MDIO driver currently supports the chipset IPQ40xx, IPQ807x, IPQ60xx and IPQ50xx. Add the compatible 'qcom,ipq5018-mdio' because of ethernet LDO dedicated to the IPQ5018 platform. Signed-off-by: Luo Jie <luoj@codeaurora.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-16	net: mdio: Add the reset function for IPQ MDIO driver	Luo Jie	2	-0/+44
	1. configure the MDIO clock source frequency. 2. the LDO resource is needed to configure the ethernet LDO available for CMN_PLL. Signed-off-by: Luo Jie <luoj@codeaurora.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-14	net: ipa: don't hold clock reference while netdev open	Alex Elder	1	-2/+10
	Currently a clock reference is taken whenever the ->ndo_open callback for the modem netdev is called. That reference is dropped when the device is closed, in ipa_stop(). We no longer need this, because ipa_start_xmit() now handles the situation where the hardware power state is not active. Drop the clock reference in ipa_open() when we're done, and take a new reference in ipa_stop() before we begin closing the interface. Finally (and unrelated, but trivial), change the return type of ipa_start_xmit() to be netdev_tx_t instead of int. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-14	net: ipa: don't stop TX on suspend	Alex Elder	1	-2/+0
	Currently we stop the modem netdev transmit queue when suspending the hardware. For system suspend this ensured we'd never attempt to transmit while attempting to suspend the modem endpoints. For runtime suspend, the IPA hardware might get suspended while the system is operating. In that case we want an attempt to transmit a packet to cause the hardware to resume if necessary. But if we disable the queue this cannot happen. So stop disabling the queue on suspend. In case we end up disabling it in ipa_start_xmit() (see the previous commit), we still arrange to start the TX queue on resume. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-14	net: ipa: ensure hardware has power in ipa_start_xmit()	Alex Elder	1	-1/+29
	We need to ensure the hardware is powered when we transmit a packet. But if it's not, we can't block to wait for it. So asynchronously request power in ipa_start_xmit(), and only proceed if the return value indicates the power state is active. If the hardware is not active, a runtime resume request will have been initiated. In that case, stop the network stack from further transmit attempts until the resume completes. Return NETDEV_TX_BUSY, to retry sending the packet once the queue is restarted. If the power request returns an error (other than -EINPROGRESS, which just means a resume requested elsewhere isn't complete), just drop the packet. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-14	net: ipa: re-enable transmit in PM WQ context	Alex Elder	1	-2/+28
	Create a new work structure in the modem private data, and use it to re-enable the modem network device transmit queue when resuming. This is needed by the next patch, which stops the TX queue if IPA power isn't active when a transmit request arrives. Packets will start arriving the instant the TX queue is enabled, but resuming isn't complete until ipa_modem_resume() returns. This way we're sure to be resumed before transmits are allowed again. Cancel it before calling ipa_stop() in ipa_modem_stop() to ensure the transmit queue restart completes before it gets stopped there. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>