summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet
AgeCommit message (Collapse)AuthorFilesLines
2022-10-03octeontx2-af: cn10k: mcs: Handle MCS block interruptsGeetha sowjanya8-12/+865
Hardware triggers an interrupt for events like PN wrap to zero, PN crosses set threshold. This interrupt is received by the MCS_AF. MCS AF then finds the PF/VF to which SA is mapped and notifies them using mcs_intr_notify mbox message. PF/VF using mcs_intr_cfg mbox can configure the list of interrupts for which they want to receive the notification from AF. Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03octeontx2-af: cn10k: mcs: Support for stats collectionGeetha sowjanya6-0/+1048
Add mailbox messages to return the resource stats to the caller. Stats of SecY, SC and SAs as per the macsec standard, TCAM flow id hits/miss, mailbox to clear the stats are implemented. Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Ankur Dwivedi <adwivedi@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03octeontx2-af: cn10k: mcs: Install a default TCAM for normal trafficGeetha sowjanya3-0/+69
Out of all the TCAM entries, reserve last TX and RX TCAM flow entry(low priority) so that normal traffic can be sent out and received. The traffic which needs macsec processing hits the high priority TCAM flows. Also install a FLR handler to free the allocated resources for PF/VF. Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03octeontx2-af: cn10k: mcs: Manage the MCS block hardware resourcesGeetha sowjanya6-1/+1530
To establish a macsec connection association netdev driver needs hardware resources like SecY, TCAM flows, SCs and SAs. This patch manages allocating, freeing and configuring those resources. AF consumers can request resources and configure them via these mailbox messages. AF can allocate until it runs out of hardware resources. Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03octeontx2-af: cn10k: mcs: Add mailboxes for port related operationsGeetha sowjanya5-4/+376
There are set of configurations to be done at MCS port level like bringing port out of reset, making port as operational or bypass. This patch adds all the port related mailbox message handlers so that AF consumers can use them. Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03octeontx2-af: cn10k: Introduce driver for macsec block.Geetha sowjanya8-1/+678
CN10K-B and CNF10K-B has macsec block(MCS) to encrypt and decrypt packets at MAC level. This block is a global resource with hardware resources like SecYs, SCs and SAs and is in between NIX block and RPM LMAC. CN10K-B silicon has only one MCS block which receives packets from all LMACS whereas CNF10K-B has seven MCS blocks for seven LMACs. Both MCS blocks are similar in operation except for few register offsets and some configurations require writing to different registers. Those differences between IPs are handled using separate ops. This patch adds basic driver and does the initial hardware calibration and parser configuration. Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03eth: sp7021: fix use after free bug in spl2sw_nvmem_get_mac_addressZheng Wang1-1/+1
This frees "mac" and tries to display its address as part of the error message on the next line. Swap the order. Fixes: fd3040b9394c ("net: ethernet: Add driver for Sunplus SP7021") Signed-off-by: Zheng Wang <zyytlz.wz@163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03net: lan966x: Add port mirroring support using tc-matchallHoratiu Vultur5-1/+193
Add support for port mirroring. It is possible to mirror only one port at a time and it is possible to have both ingress and egress mirroring. Frames injected by the CPU don't get egress mirrored because they are bypassing the analyzer module. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03net: lan966x: Add port police support using tc-matchallHoratiu Vultur6-1/+468
Add support for port police. It is possible to police only on the ingress side. To be able to add police support also it was required to add tc-matchall classifier offload support. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03net: fec: using page pool to manage RX buffersShenwei Wang3-58/+119
This patch optimizes the RX buffer management by using the page pool. The purpose for this change is to prepare for the following XDP support. The current driver uses one frame per page for easy management. Added __maybe_unused attribute to the following functions to avoid the compiling warning. Those functions will be removed by a separate patch once this page pool solution is accepted. - fec_enet_new_rxbdp - fec_enet_copybreak The following are the comparing result between page pool implementation and the original implementation (non page pool). --- small packet (64 bytes) testing are almost the same --- no matter what the implementation is --- on both i.MX8 and i.MX6SX platforms. shenwei@5810:~/pktgen$ iperf -c 10.81.16.245 -w 2m -i 1 -l 64 ------------------------------------------------------------ Client connecting to 10.81.16.245, TCP port 5001 TCP window size: 416 KByte (WARNING: requested 1.91 MByte) ------------------------------------------------------------ [ 1] local 10.81.17.20 port 39728 connected with 10.81.16.245 port 5001 [ ID] Interval Transfer Bandwidth [ 1] 0.0000-1.0000 sec 37.0 MBytes 311 Mbits/sec [ 1] 1.0000-2.0000 sec 36.6 MBytes 307 Mbits/sec [ 1] 2.0000-3.0000 sec 37.2 MBytes 312 Mbits/sec [ 1] 3.0000-4.0000 sec 37.1 MBytes 312 Mbits/sec [ 1] 4.0000-5.0000 sec 37.2 MBytes 312 Mbits/sec [ 1] 5.0000-6.0000 sec 37.2 MBytes 312 Mbits/sec [ 1] 6.0000-7.0000 sec 37.2 MBytes 312 Mbits/sec [ 1] 7.0000-8.0000 sec 37.2 MBytes 312 Mbits/sec [ 1] 0.0000-8.0943 sec 299 MBytes 310 Mbits/sec --- Page Pool implementation on i.MX8 ---- shenwei@5810:~$ iperf -c 10.81.16.245 -w 2m -i 1 ------------------------------------------------------------ Client connecting to 10.81.16.245, TCP port 5001 TCP window size: 416 KByte (WARNING: requested 1.91 MByte) ------------------------------------------------------------ [ 1] local 10.81.17.20 port 43204 connected with 10.81.16.245 port 5001 [ ID] Interval Transfer Bandwidth [ 1] 0.0000-1.0000 sec 111 MBytes 933 Mbits/sec [ 1] 1.0000-2.0000 sec 111 MBytes 934 Mbits/sec [ 1] 2.0000-3.0000 sec 112 MBytes 935 Mbits/sec [ 1] 3.0000-4.0000 sec 111 MBytes 933 Mbits/sec [ 1] 4.0000-5.0000 sec 111 MBytes 934 Mbits/sec [ 1] 5.0000-6.0000 sec 111 MBytes 933 Mbits/sec [ 1] 6.0000-7.0000 sec 111 MBytes 931 Mbits/sec [ 1] 7.0000-8.0000 sec 112 MBytes 935 Mbits/sec [ 1] 8.0000-9.0000 sec 111 MBytes 933 Mbits/sec [ 1] 9.0000-10.0000 sec 112 MBytes 935 Mbits/sec [ 1] 0.0000-10.0077 sec 1.09 GBytes 933 Mbits/sec --- Non Page Pool implementation on i.MX8 ---- shenwei@5810:~$ iperf -c 10.81.16.245 -w 2m -i 1 ------------------------------------------------------------ Client connecting to 10.81.16.245, TCP port 5001 TCP window size: 416 KByte (WARNING: requested 1.91 MByte) ------------------------------------------------------------ [ 1] local 10.81.17.20 port 49154 connected with 10.81.16.245 port 5001 [ ID] Interval Transfer Bandwidth [ 1] 0.0000-1.0000 sec 104 MBytes 868 Mbits/sec [ 1] 1.0000-2.0000 sec 105 MBytes 878 Mbits/sec [ 1] 2.0000-3.0000 sec 105 MBytes 881 Mbits/sec [ 1] 3.0000-4.0000 sec 105 MBytes 879 Mbits/sec [ 1] 4.0000-5.0000 sec 105 MBytes 878 Mbits/sec [ 1] 5.0000-6.0000 sec 105 MBytes 878 Mbits/sec [ 1] 6.0000-7.0000 sec 104 MBytes 875 Mbits/sec [ 1] 7.0000-8.0000 sec 104 MBytes 875 Mbits/sec [ 1] 8.0000-9.0000 sec 104 MBytes 873 Mbits/sec [ 1] 9.0000-10.0000 sec 104 MBytes 875 Mbits/sec [ 1] 0.0000-10.0073 sec 1.02 GBytes 875 Mbits/sec --- Page Pool implementation on i.MX6SX ---- shenwei@5810:~/pktgen$ iperf -c 10.81.16.245 -w 2m -i 1 ------------------------------------------------------------ Client connecting to 10.81.16.245, TCP port 5001 TCP window size: 416 KByte (WARNING: requested 1.91 MByte) ------------------------------------------------------------ [ 1] local 10.81.17.20 port 57288 connected with 10.81.16.245 port 5001 [ ID] Interval Transfer Bandwidth [ 1] 0.0000-1.0000 sec 78.8 MBytes 661 Mbits/sec [ 1] 1.0000-2.0000 sec 82.5 MBytes 692 Mbits/sec [ 1] 2.0000-3.0000 sec 82.4 MBytes 691 Mbits/sec [ 1] 3.0000-4.0000 sec 82.4 MBytes 691 Mbits/sec [ 1] 4.0000-5.0000 sec 82.5 MBytes 692 Mbits/sec [ 1] 5.0000-6.0000 sec 82.4 MBytes 691 Mbits/sec [ 1] 6.0000-7.0000 sec 82.5 MBytes 692 Mbits/sec [ 1] 7.0000-8.0000 sec 82.4 MBytes 691 Mbits/sec [ 1] 8.0000-9.0000 sec 82.4 MBytes 691 Mbits/sec [ 1] 9.0000-9.5506 sec 45.0 MBytes 686 Mbits/sec [ 1] 0.0000-9.5506 sec 783 MBytes 688 Mbits/sec --- Non Page Pool implementation on i.MX6SX ---- shenwei@5810:~/pktgen$ iperf -c 10.81.16.245 -w 2m -i 1 ------------------------------------------------------------ Client connecting to 10.81.16.245, TCP port 5001 TCP window size: 416 KByte (WARNING: requested 1.91 MByte) ------------------------------------------------------------ [ 1] local 10.81.17.20 port 36486 connected with 10.81.16.245 port 5001 [ ID] Interval Transfer Bandwidth [ 1] 0.0000-1.0000 sec 70.5 MBytes 591 Mbits/sec [ 1] 1.0000-2.0000 sec 64.5 MBytes 541 Mbits/sec [ 1] 2.0000-3.0000 sec 73.6 MBytes 618 Mbits/sec [ 1] 3.0000-4.0000 sec 73.6 MBytes 618 Mbits/sec [ 1] 4.0000-5.0000 sec 72.9 MBytes 611 Mbits/sec [ 1] 5.0000-6.0000 sec 73.4 MBytes 616 Mbits/sec [ 1] 6.0000-7.0000 sec 73.5 MBytes 617 Mbits/sec [ 1] 7.0000-8.0000 sec 73.4 MBytes 616 Mbits/sec [ 1] 8.0000-9.0000 sec 73.4 MBytes 616 Mbits/sec [ 1] 9.0000-10.0000 sec 73.9 MBytes 620 Mbits/sec [ 1] 0.0000-10.0174 sec 723 MBytes 605 Mbits/sec Signed-off-by: Shenwei Wang <shenwei.wang@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03bnx2x: fix potential memory leak in bnx2x_tpa_stop()Jianglei Nie1-0/+1
bnx2x_tpa_stop() allocates a memory chunk from new_data with bnx2x_frag_alloc(). The new_data should be freed when gets some error. But when "pad + len > fp->rx_buf_size" is true, bnx2x_tpa_stop() returns without releasing the new_data, which will lead to a memory leak. We should free the new_data with bnx2x_frag_free() when "pad + len > fp->rx_buf_size" is true. Fixes: 07b0f00964def8af9321cfd6c4a7e84f6362f728 ("bnx2x: fix possible panic under memory stress") Signed-off-by: Jianglei Nie <niejianglei2021@163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03eth: lan743x: reject extts for non-pci11x1x devicesRaju Lakkaraju1-0/+7
Remove PTP_PF_EXTTS support for non-PCI11x1x devices since they do not support the PTP-IO Input event triggered timestamping mechanisms added Fixes: 60942c397af6 ("net: lan743x: Add support for PTP-IO Event Input External Timestamp (extts)") Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03net: prestera: acl: Add check for kmemdupJiasheng Jiang3-5/+13
As the kemdup could return NULL, it should be better to check the return value and return error if fails. Moreover, the return value of prestera_acl_ruleset_keymask_set() should be checked by cascade. Fixes: 604ba230902d ("net: prestera: flower template support") Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn> Reviewed-by: Taras Chornyi<tchornyi@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03net: sparx5: Fix return type of sparx5_port_xmit_implNathan Huckleberry2-3/+3
The ndo_start_xmit field in net_device_ops is expected to be of type netdev_tx_t (*ndo_start_xmit)(struct sk_buff *skb, struct net_device *dev). The mismatched return type breaks forward edge kCFI since the underlying function definition does not match the function hook definition. The return type of sparx5_port_xmit_impl should be changed from int to netdev_tx_t. Reported-by: Dan Carpenter <error27@gmail.com> Link: https://github.com/ClangBuiltLinux/linux/issues/1703 Cc: llvm@lists.linux.dev Signed-off-by: Nathan Huckleberry <nhuck@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-01net/mlx5e: xsk: Use queue indices starting from 0 for XSK queuesMaxim Mikityanskiy14-205/+52
In the initial implementation of XSK in mlx5e, XSK RQs coexisted with regular RQs in the same channel. The main idea was to allow RSS work the same for regular traffic, without need to reconfigure RSS to exclude XSK queues. However, this scheme didn't prove to be beneficial, mainly because of incompatibility with other vendors. Some tools don't properly support using higher indices for XSK queues, some tools get confused with the double amount of RQs exposed in sysfs. Some use cases are purely XSK, and allocating the same amount of unused regular RQs is a waste of resources. This commit changes the queuing scheme to the standard one, where XSK RQs replace regular RQs on the channels where XSK sockets are open. Two RQs still exist in the channel to allow failsafe disable of XSK, but only one is exposed at a time. The next commit will achieve the desired memory save by flushing the buffers when the regular RQ is unused. As the result of this transition: 1. It's possible to use RSS contexts over XSK RQs. 2. It's possible to dedicate all queues to XSK. 3. When XSK RQs coexist with regular RQs, the admin should make sure no unwanted traffic goes into XSK RQs by either excluding them from RSS or settings up the XDP program to return XDP_PASS for non-XSK traffic. 4. When using a mixed fleet of mlx5e devices and other netdevs, the same configuration can be applied. If the application supports the fallback to copy mode on unsupported drivers, it will work too. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-01net/mlx5e: Introduce the mlx5e_flush_rq functionMaxim Mikityanskiy3-24/+29
Add a function to flush an RQ: clean up descriptors, release pages and reset the RQ. This procedure is used by the recovery flow, and it will also be used in a following commit to free some memory when switching a channel to the XSK mode. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-01net/mlx5e: xsk: Support XDP metadata on XSK RQsMaxim Mikityanskiy1-8/+12
Add support for XDP metadata on XSK RQs for cross-program communication. The driver no longer calls xdp_set_data_meta_invalid and copies the metadata to a newly allocated SKB on XDP_PASS. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-01net/mlx5e: Optimize RQ page deallocationMaxim Mikityanskiy2-19/+24
mlx5e_free_rx_mpwqe loops over all pages of a MPWQE, calling mlx5e_page_release for ones that are not scheduled for XDP_TX or XDP_REDIRECT; and mlx5e_page_release checks whether it's an XSK RQ or a regular one for each page/XSK frame. This check can be moved outside the loop to reduce the number of branches. mlx5e_free_rx_wqe loops over all fragments, calling mlx5e_page_release for the ones that are last in a page; and mlx5e_page_release checks whether it's an XSK RQ or a regular one for each fragment. Using the fact that XSK doesn't support multiple fragments, it can be optimized for both XSK and regular usages: 1. Make an early check for XSK and call its deallocator directly, saving 3 branches (loop condition, frag->last_in_page and selection of deallocator). 2. Call the regular deallocator directly in the non-XSK case, saving a branch per fragment, except the first one. After the changes, mlx5e_page_release is removed, as there are no callers left. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-01net/mlx5e: Call mlx5e_page_release_dynamic directly where possibleMaxim Mikityanskiy1-16/+4
mlx5e_page_release calls the appropriate deallocator depending on whether it's an XSK RQ or a regular one. Some flows that call this function are not compatible with XSK, so they can call the non-XSK deallocator directly to save a branch. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-01net/mlx5e: Use non-XSK page allocator in SHAMPOMaxim Mikityanskiy1-11/+1
The SHAMPO flow is not compatible with XSK, it can call the page pool allocator directly to save a branch. mlx5e_page_alloc is removed, as it's no longer used in any flow. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-01net/mlx5e: xsk: Use xsk_buff_alloc_batch on striding RQMaxim Mikityanskiy4-48/+106
XSK provides a function to allocate frames in batches for more efficient processing. This commit starts using this function on striding RQ and creates an optimized flow for XSK. A side effect is an opportunity to optimize the regular RX flow by dropping branching for XSK cases. Performance improvement is up to 6.4% in the aligned mode and up to 7.5% in the unaligned mode. Aligned mode, 2048-byte frames: 12.9 Mpps -> 13.8 Mpps Aligned mode, 4096-byte frames: 11.8 Mpps -> 12.5 Mpps Unaligned mode, 2048-byte frames: 11.9 Mpps -> 12.8 Mpps Unaligned mode, 3072-byte frames: 11.4 Mpps -> 12.1 Mpps Unaligned mode, 4096-byte frames: 11.0 Mpps -> 11.2 Mpps CPU: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-01net/mlx5e: xsk: Use xsk_buff_alloc_batch on legacy RQMaxim Mikityanskiy4-0/+55
XSK provides a function to allocate frames in batches for more efficient processing. This commit starts using this function on legacy RQ, adding a special case for XSK. The new branch introduced basically replaces the branch that was removed from the same place a few commits before. A check is made that DMA sync is not needed, because the batching allocator falls back to returning one frame when DMA sync is needed, and this is best handled by the loop in the standard case. Performance improvement is up to 8% in the aligned mode and up to 9% in the unaligned mode. Aligned mode, 2048-byte frames: 12.8 Mpps -> 13.5 Mpps Aligned mode, 4096-byte frames: 11.5 Mpps -> 12.4 Mpps Unaligned mode, 2048-byte frames: 12.2 Mpps -> 13.4 Mpps Unaligned mode, 3072-byte frames: 11.6 Mpps -> 12.5 Mpps Unaligned mode, 4096-byte frames: 11.2 Mpps -> 12.2 Mpps CPU: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-01net/mlx5e: xsk: Split out WQE allocation for legacy XSK RQMaxim Mikityanskiy3-4/+34
Allocation of XSK frames on legacy RQ may be made more efficient with a specialized routine that relies on certain assumptions, such as there is only one fragment, allocation units (XSK frames) are not shared among multiple packets. It reduces the number of branches both in the XSK code and in the regular RQ, because with this approach there is only a single check whether it's an XSK or regular RQ. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-01net/mlx5e: Remove the outer loop when allocating legacy RQ WQEsMaxim Mikityanskiy2-22/+17
Legacy RQ WQEs are allocated in a loop in small batches (8 WQEs). As partial batches are allowed, there is no point to have a loop in a loop, so the outer loop is removed, and the batch size is increased up to the total number of WQEs to allocate, still not smaller than 8. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-01net/mlx5e: xsk: Use partial batches in legacy RQ with XSKMaxim Mikityanskiy1-13/+1
The previous commit allowed allocating WQE batches in legacy RQ partially, however, XSK still checks whether there are enough frames in the fill ring. Remove this check to allow to allocate batches partially also with XSK. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-01net/mlx5e: Use partial batches in legacy RQMaxim Mikityanskiy1-18/+21
Legacy RQ allocates WQEs in batches. If the batch allocation fails, the pages of the allocated part are released. This commit changes this behavior to allow to use the pages that have been already allocated. After this change, we need to be careful about indexing rq->wqe.frags[]. The WQ size is a power of two that divides by wqe_bulk (8), and the old code used whole bulks, which allowed to use indices [8*K; 8*K+7] without overflowing. Now that the bulks may be partial, the range can start at any location (not only at 8*K), so we need to wrap them around to avoid out-of-bounds array access. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-01net/mlx5e: Make the wqe_index_mask calculation more exactMaxim Mikityanskiy1-1/+20
The old calculation of wqe_index_mask may give false positives, i.e. request bulking of pairs of WQEs when not strictly needed, for example, when the first fragment size is equal to the PAGE_SIZE, bulking is not needed, even if the number of fragments is odd. Make the calculation more exact to cut false positives. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-01net/mlx5e: Introduce wqe_index_mask for legacy RQMaxim Mikityanskiy2-4/+22
When fragments of different WQEs share the same page, mlx5e_post_rx_wqes must wait until the old WQE stops using the page, only then the new WQE can allocate the new page. Essentially, it means that if WQE index i is still in use, the allocation must stop before `i % bulk`, where bulk is the number of WQEs that may share the same page. As bulk is always a power of two, `i % bulk = i & (bulk - 1)`, and the new wqe_index_mask field will be equal to `bulk - 1`. At the same time, wqe_bulk remains for optimization purposes and stores `max(bulk, 8)`, which allows to skip the allocation until we have at least 8 WQEs free. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-01net/mlx5e: xsk: Drop the check for XSK state in mlx5e_xsk_wakeupMaxim Mikityanskiy2-4/+1
The MLX5E_CHANNEL_STATE_XSK flag checked in mlx5e_xsk_wakeup indicates that XSK queues are open, but not necessarily activated. This check is not very useful, because: 0. Both XSK setup and netdev state transitions take the same state_lock mutex, so they can't happen at the same time. 1. If the netdev is up, xsk_is_bound can return true only when MLX5E_CHANNEL_STATE_XSK is set on the corresponding channel. mlx5e_xsk_wakeup is only called when xsk_is_bound is true. 2. If the XSK socket is bound, and the netdev is going up or down, mlx5e_xsk_wakeup can take one of two branches, depending on the return value of napi_if_scheduled_mark_missed: 2.1. True means one of two things: either NAPI was enabled at this point, which means MLX5E_CHANNEL_STATE_XSK was also set; or NAPI was disabled, and nothing really happened. 2.2. False means that NAPI was enabled by this point, which also implies MLX5E_CHANNEL_STATE_XSK was set. Additionally, mlx5e_xsk_wakeup contains a following check for MLX5E_SQ_STATE_ENABLED on async_icosq, and this flag implies MLX5E_CHANNEL_STATE_XSK too on XSK channels. As checking this flag doesn't cut any flows, remove the check. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-01net/mlx5e: xsk: Use mlx5e_trigger_napi_icosq for XSK wakeupMaxim Mikityanskiy1-3/+1
mlx5e_xsk_wakeup triggers an IRQ by posting a NOP to async_icosq, taking a spinlock to protect from concurrent access. There is already a function that does the same: mlx5e_trigger_napi_icosq. Use this function in mlx5e_xsk_wakeup. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-01net: ethernet: mtk_eth_soc: fix state in __mtk_foe_entry_clearDaniel Golle1-1/+1
Setting ib1 state to MTK_FOE_STATE_UNBIND in __mtk_foe_entry_clear routine as done by commit 0e80707d94e4c8 ("net: ethernet: mtk_eth_soc: fix typo in __mtk_foe_entry_clear") breaks flow offloading, at least on older MTK_NETSYS_V1 SoCs, OpenWrt users have confirmed the bug on MT7622 and MT7621 systems. Felix Fietkau suggested to use MTK_FOE_STATE_INVALID instead which works well on both, MTK_NETSYS_V1 and MTK_NETSYS_V2. Tested on MT7622 (Linksys E8450) and MT7986 (BananaPi BPI-R3). Suggested-by: Felix Fietkau <nbd@nbd.name> Fixes: 0e80707d94e4c8 ("net: ethernet: mtk_eth_soc: fix typo in __mtk_foe_entry_clear") Fixes: 33fc42de33278b ("net: ethernet: mtk_eth_soc: support creating mac address based offload entries") Signed-off-by: Daniel Golle <daniel@makrotopia.org> Link: https://lore.kernel.org/r/YzY+1Yg0FBXcnrtc@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-01nfp: add support restart of link auto-negotiationFei Qin1-0/+33
Add support restart of link auto-negotiation. This may be initiated using: # ethtool -r <intf> Signed-off-by: Fei Qin <fei.qin@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-01nfp: add support for link auto negotiationYinjun Zhang4-5/+33
Report the auto negotiation capability if it's supported in management firmware, and advertise it if it's enabled. Changing port speed is not allowed when autoneg is enabled. The ethtool <intf> command displays the auto-neg capability: # ethtool enp1s0np0 Settings for enp1s0np0: Supported ports: [ FIBRE ] Supported link modes: Not reported Supported pause frame use: Symmetric Supports auto-negotiation: Yes Supported FEC modes: None RS BASER Advertised link modes: Not reported Advertised pause frame use: Symmetric Advertised auto-negotiation: Yes Advertised FEC modes: None RS BASER Speed: 25000Mb/s Duplex: Full Auto-negotiation: on Port: FIBRE PHYAD: 0 Transceiver: internal Link detected: yes Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-01nfp: refine the ABI of getting `sp_indiff` infoYinjun Zhang5-65/+71
Considering that whether application firmware is indifferent to port speed is a firmware property instead of port property, now use a new rtsym to get the property instead of parsing per-port tlv caps. With this change, relevant code is moved to `nfp_main` layer. Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-01nfp: avoid halt of driver init process when non-fatal error happensYinjun Zhang1-5/+4
It's not a fatal error when setting `hwinfo` into management firmware fails, no need to halt the whole driver initialization process. Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-01nfp: add support for reporting active FEC modeYinjun Zhang3-2/+11
The latest management firmware can now report the active FEC mode. Adapt driver accordingly so that user can get the active FEC mode by running command: # ethtool --show-fec <intf> Also correct use of `fec` field. Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-01r8169: add rtl_disable_rxdvgate()Chunhao Lin1-6/+10
rtl_disable_rxdvgate() is used for disable RXDV_GATE. It is opposite function of rtl_enable_rxdvgate(). Disable RXDV_GATE does not have to delay. So in this patch, also remove the delay after disale RXDV_GATE. Signed-off-by: Chunhao Lin <hau@realtek.com> Link: https://lore.kernel.org/r/20220928171356.3951-1-hau@realtek.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30net/mlx5e: Clean up and fix error flows in mlx5e_alloc_rqMaxim Mikityanskiy1-5/+7
Although mlx5e_rq_free_shampo can be called unconditionally, it belongs to case MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ. Move it there to allow to add more init/cleanup actions to the striding RQ case. If xdp_rxq_info_reg_mem_model fails, don't forget to destroy the page pool. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30net/mlx5e: Move repeating clear_bit in mlx5e_rx_reporter_err_rq_cqe_recoverMaxim Mikityanskiy1-5/+2
The same clear_bit is called in both error and success flows. Move the call to do it only once and remove the out label. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30net/mlx5e: Split out channel (de)activation in rx_resMaxim Mikityanskiy1-53/+53
To decrease the nesting level and reduce duplication of code, create functions to redirect direct RQTs to the actual RQs or drop_rq, which are used in the activation and deactivation flows of channels. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30net/mlx5e: xsk: Remove mlx5e_xsk_page_alloc_poolMaxim Mikityanskiy2-13/+5
mlx5e_xsk_page_alloc_pool became a thin wrapper around xsk_buff_alloc. Drop it and call xsk_buff_alloc directly. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30net/mlx5e: Convert struct mlx5e_alloc_unit to a unionMaxim Mikityanskiy4-32/+30
struct mlx5e_alloc_unit consists of a single union. Convert it to a union itself to simplify casting it to struct xdp_buff *, which will be used to implement XSK batching on striding RQ. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30net/mlx5e: Remove DMA address from mlx5e_alloc_unitMaxim Mikityanskiy3-25/+40
mlx5e_alloc_unit stores the DMA address and a pointer to either struct page (regular RQ) or struct xdp_buff (XSK RQ). This DMA address is redundant, because when a page or an XSK frame is allocated, the same address is also stored there. Some flows take the address from struct mlx5e_alloc_unit, and some take it from struct page or xdp_buff. This commit removes the address from struct mlx5e_alloc_unit, which makes it twice as small and improves locality (this struct is used in an array), also saving on unnecessary stores to the addr field. Almost all flows know unambiguously whether the DMA address should be taken from page or from xdp_buff. The exception is the allocation flows, where a new branch appeared, which will be optimized out in the next commits. struct mlx5e_alloc_unit used to be called mlx5e_dma_info. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30net/mlx5e: Rename mlx5e_dma_info to prepare for removal of DMA addressMaxim Mikityanskiy5-111/+123
The next commit will remove the DMA address from the struct currently called mlx5e_dma_info, because the same value can be retrieved with page_pool_get_dma_addr(page) in almost all cases, with the notable exception of SHAMPO (HW GRO implementation) that modifies this address on the fly, after the initial allocation. To keep the SHAMPO logic intact, struct mlx5e_dma_info remains in the SHAMPO code, consisting of addr and page (XSK is not compatible with SHAMPO). The struct used in all other places is renamed to mlx5e_alloc_unit, allowing the next commit to remove the addr field without affecting SHAMPO. The new name means "allocation unit", and it's more appropriate after the field with the DMA address gets removed. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30net/mlx5e: Optimize the page cache reducing its size 2xMaxim Mikityanskiy3-8/+6
RX page cache stores dma_info structs, that consist of a pointer to struct page and a DMA address. In fact, the DMA address is extracted from struct page using page_pool_get_dma_addr when a page is pushed to the cache. By moving this call to the point when a page is popped from the cache, we can avoid storing the DMA address in the cache, effectively reducing its size by two times without losing any functionality. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30net/mlx5e: Fix calculations for ICOSQ sizeMaxim Mikityanskiy1-1/+13
WQEs must not cross page boundaries, they are padded with NOPs if they don't fit the page. mlx5e_mpwrq_total_umr_wqebbs doesn't take into account this padding, risking reserving not enough space. The padding is not straightforward to add to this calculation, because WQEs of different sizes may be mixed together in the queue. If each page ends with a big WQE that doesn't fit and requires at most its size minus 1 WQEBB of padding, the total space can be much bigger than in case when smaller WQEs take advantage of this padding. Replace the wrong exact calculation by the following estimation. Each padding can be at most the size of the maximum WQE used in the queue minus one WQEBB. Let's call the rest of the page "useful space". If we divide the total size of all needed WQEs by this useful space, rounding up, we'll get the number of pages, which is enough to contain all these WQEs. It's correct, because every WQE that appeared on the boundary between two blocks of useful space would start in the useful space of one page and end in the padding of the same page, while our estimation reserved space for its tail in the next space, making the estimation not smaller than the real space occupied in the queue. The code actually uses a looser estimation: instead of taking the maximum size of all used WQE types minus 1 WQEBB, it takes the maximum hardware size of a WQE. It's made for simplicity and extensibility. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30net/mlx5e: xsk: Use KSM for unaligned XSKMaxim Mikityanskiy8-100/+185
UMR MTTs used in striding RQ have certain alignment requirements. While it's guaranteed to work when UMR pages are aligned to the UMR page size, in practice it works then UMR pages are aligned to 8 bytes. However, it's still not enough flexibility for the unaligned mode of XSK. This patch leverages KSM to map UMR pages without alignment requirements, when unaligned XSK is active. The downside is that KSM entries are twice as big as MTTs, which limits the maximum WQE size, so regular RQs and aligned XSK continue using MTTs. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30net/mlx5: Add MLX5_FLEXIBLE_INLEN to safely calculate cmd inlenMaxim Mikityanskiy1-0/+30
Some commands use a flexible array after a common header. Add a macro to safely calculate the total input length of the command, detecting overflows and printing errors with specific values when such overflows happen. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30net/mlx5e: Keep a separate MKey for striding RQMaxim Mikityanskiy2-10/+13
Currently, rq->mkey_be keeps a big-endian value of either the PA MKey (for legacy RQ, no address translation) or MTT MKey (for striding RQ, direct address translation). Striding RQ stores the same value in rq->umr_mkey in the native endianness. The next commit will make striding RQ use KSM MKey (indirect address translation) for the unaligned mode of XSK, which will require storing both KSM MKey and PA MKey in the RQ struct. This commit optimizes fields of mlx5e_rq: umr_mkey is removed (it's redundant), mkey_be always points to the PA MKey, and mpwqe.umr_mkey_be points to the MTT MKey (or to the KSM MKey, starting from the next commit). Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30net/mlx5e: xsk: Use XSK frame size as striding RQ page sizeMaxim Mikityanskiy1-6/+31
XSK RQs support striding RQ linear mode, but the stride size is always set to PAGE_SIZE. It may be larger than the XSK frame size, unnecessarily reducing the useful space in a WQE, but more importantly causing UMEM data corruption in certain cases. Normally, stride size bigger than XSK frame size is not a problem if the hardware enforces the MTU. However, traffic between vports skips the hardware MTU check, and oversized packets may be received. If an oversized packet is bigger than the XSK frame but not bigger than the stride, it will cause overwriting of the adjacent UMEM region. If the packet takes more than one stride, they can be recycled for reuse so it's not a problem when the XSK frame size matches the stride size. To reduce the impact of the above issue, attempt to use the MTT page size for striding RQ that matches the XSK frame size, allowing to safely use 2048-byte frames on an up-to-date firmware. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>