summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet
AgeCommit message (Collapse)AuthorFilesLines
2021-08-19i40e: Fix ATR queue selectionArkadiusz Kubalewski1-2/+1
Without this patch, ATR does not work. Receive/transmit uses queue selection based on SW DCB hashing method. If traffic classes are not configured for PF, then use netdev_pick_tx function for selecting queue for packet transmission. Instead of calling i40e_swdcb_skb_tx_hash, call netdev_pick_tx, which ensures that packet is transmitted/received from CPU that is running the application. Reproduction steps: 1. Load i40e driver 2. Map each MSI interrupt of i40e port for each CPU 3. Disable ntuple, enable ATR i.e.: ethtool -K $interface ntuple off ethtool --set-priv-flags $interface flow-director-atr 4. Run application that is generating traffic and is bound to a single CPU, i.e.: taskset -c 9 netperf -H 1.1.1.1 -t TCP_RR -l 10 5. Observe behavior: Application's traffic should be restricted to the CPU provided in taskset. Fixes: 89ec1f0886c1 ("i40e: Fix queue-to-TC mapping on Tx") Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Tested-by: Dave Switzer <david.switzer@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-19net: ethernet: ti: cpsw: make array stpa static const, makes object smallerColin Ian King1-1/+1
Don't populate the array stpa on the stack but instead it static const. Makes the object code smaller by 81 bytes: Before: text data bss dec hex filename 54993 17248 0 72241 11a31 ./drivers/net/ethernet/ti/cpsw_new.o After: text data bss dec hex filename 54784 17376 0 72160 119e0 ./drivers/net/ethernet/ti/cpsw_new.o (gcc version 10.3.0) Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-19net: hns3: make array spec_opcode static const, makes object smallerColin Ian King1-11/+13
Don't populate the array spec_opcode on the stack but instead it static const. Makes the object code smaller by 158 bytes: Before: text data bss dec hex filename 12271 3976 128 16375 3ff7 .../hisilicon/hns3/hns3pf/hclge_cmd.o After: text data bss dec hex filename 12017 4072 128 16217 3f59 .../hisilicon/hns3/hns3pf/hclge_cmd.o (gcc version 10.3.0) Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-19hinic: make array speeds static const, makes object smallerColin Ian King1-2/+4
Don't populate the array speeds on the stack but instead it static const. Makes the object code smaller by 17 bytes: Before: text data bss dec hex filename 39987 14200 64 54251 d3eb .../huawei/hinic/hinic_sriov.o After: text data bss dec hex filename 39906 14264 64 54234 d3da .../huawei/hinic/hinic_sriov.o (gcc version 10.3.0) Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-19net: pch_gbe: remove mii_ethtool_gset() error handlingPavel Skripkin2-10/+2
mii_ethtool_gset() does not return any errors, so error handling can be omitted to make code more simple. Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-19ravb: Add tx_counters to struct ravb_hw_infoBiju Das2-1/+4
The register for retrieving TX counters is present only on R-Car Gen3 and RZ/G2L; it is not present on R-Car Gen2. Add the tx_counters hw feature bit to struct ravb_hw_info, to enable this feature specifically for R-Car Gen3 now and later extend it to RZ/G2L. Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-19ravb: Add internal delay hw feature to struct ravb_hw_infoBiju Das2-2/+7
R-Car Gen3 supports TX and RX clock internal delay modes, whereas R-Car Gen2 and RZ/G2L do not support it. Add an internal_delay hw feature bit to struct ravb_hw_info to enable this only for R-Car Gen3. Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Sergei Shtylyov <sergei.shtylyov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-19ravb: Add net_features and net_hw_features to struct ravb_hw_infoBiju Das2-4/+10
On R-Car the checksum calculation on RX frames is done by the E-MAC module, whereas on RZ/G2L it is done by the TOE. TOE calculates the checksum of received frames from E-MAC and outputs it to DMAC. TOE also calculates the checksum of transmission frames from DMAC and outputs it E-MAC. Add net_features and net_hw_features to struct ravb_hw_info, to support subsequent SoCs without any code changes in the ravb_probe function. Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Sergei Shtylyov <sergei.shtylyov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-19ravb: Add gstrings_stats and gstrings_size to struct ravb_hw_infoBiju Das2-1/+10
The device stats strings for R-Car and RZ/G2L are different. R-Car provides 30 device stats, whereas RZ/G2L provides only 15. In addition, RZ/G2L has stats "rx_queue_0_csum_offload_errors" instead of "rx_queue_0_missed_errors". Add structure variables gstrings_stats and gstrings_size to struct ravb_hw_info, so that subsequent SoCs can be added without any code changes in the ravb_get_strings function. Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Sergei Shtylyov <sergei.shtylyov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-19ravb: Add stats_len to struct ravb_hw_infoBiju Das2-3/+7
R-Car provides 30 device stats, whereas RZ/G2L provides only 15. In addition, RZ/G2L has stats "rx_queue_0_csum_offload_errors" instead of "rx_queue_0_missed_errors". Replace RAVB_STATS_LEN macro with a structure variable stats_len to struct ravb_hw_info, to support subsequent SoCs without any code changes to the ravb_get_sset_count function. Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Sergei Shtylyov <sergei.shtylyov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-19ravb: Add max_rx_len to struct ravb_hw_infoBiju Das2-4/+7
The maximum descriptor size that can be specified on the reception side for R-Car is 2048 bytes, whereas for RZ/G2L it is 8096. Add the max_rx_len variable to struct ravb_hw_info for allocating different RX skb buffer sizes for R-Car and RZ/G2L using the netdev_alloc_skb function. Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-19ravb: Add aligned_tx to struct ravb_hw_infoBiju Das2-1/+3
R-Car Gen2 needs a 4byte aligned address for the transmission buffer, whereas R-Car Gen3 doesn't have any such restriction. Add aligned_tx to struct ravb_hw_info to select the driver to choose between aligned and unaligned tx buffers. Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-19ravb: Add struct ravb_hw_info to driver dataBiju Das2-13/+28
The DMAC and EMAC blocks of Gigabit Ethernet IP found on RZ/G2L SoC are similar to the R-Car Ethernet AVB IP. With a few changes in the driver we can support both IPs. This patch adds the struct ravb_hw_info to hold hw features, driver data and function pointers to support both the IPs. It also replaces the driver data chip type with struct ravb_hw_info by moving chip type to it. Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-19ravb: Use unsigned int for num_tx_desc variable in struct ravb_privateBiju Das2-15/+15
The number of TX descriptors per packet is an unsigned value and the variable for holding this information should be unsigned. This patch replaces the data type of num_tx_desc variable in struct ravb_private from 'int' to 'unsigned int'. This patch also updates the data type of local variables to unsigned int, where the local variables are evaluated using num_tx_desc. Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-19net: mscc: ocelot: allow forwarding from bridge ports to the tag_8021q CPU portVladimir Oltean1-0/+1
Currently we are unable to ping a bridge on top of a felix switch which uses the ocelot-8021q tagger. The packets are dropped on the ingress of the user port and the 'drop_local' counter increments (the counter which denotes drops due to no valid destinations). Dumping the PGID tables, it becomes clear that the PGID_SRC of the user port is zero, so it has no valid destinations. But looking at the code, the cpu_fwd_mask (the bit mask of DSA tag_8021q ports) is clearly missing from the forwarding mask of ports that are under a bridge. So this has always been broken. Looking at the version history of the patch, in v7 https://patchwork.kernel.org/project/netdevbpf/patch/20210125220333.1004365-12-olteanv@gmail.com/ the code looked like this: /* Standalone ports forward only to DSA tag_8021q CPU ports */ unsigned long mask = cpu_fwd_mask; (...) } else if (ocelot->bridge_fwd_mask & BIT(port)) { mask |= ocelot->bridge_fwd_mask & ~BIT(port); while in v8 (the merged version) https://patchwork.kernel.org/project/netdevbpf/patch/20210129010009.3959398-12-olteanv@gmail.com/ it looked like this: unsigned long mask; (...) } else if (ocelot->bridge_fwd_mask & BIT(port)) { mask = ocelot->bridge_fwd_mask & ~BIT(port); So the breakage was introduced between v7 and v8 of the patch. Fixes: e21268efbe26 ("net: dsa: felix: perform switch setup for tag_8021q") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20210817160425.3702809-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-19PCI: Change the type of probe argument in reset functionsAmey Narkhede1-1/+1
Change the type of probe argument in functions which implement reset methods from int to bool to make the context and intent clear. Suggested-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/r/20210817180500.1253-10-ameynarkhede03@gmail.com Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2021-08-19net/mlx4: Use ARRAY_SIZE to get an array's sizeJason Wang1-1/+1
The ARRAY_SIZE macro is defined to get an array's size which is more compact and more formal in linux source. Thus, we can replace the long sizeof(arr)/sizeof(arr[0]) with the compact ARRAY_SIZE. Signed-off-by: Jason Wang <wangborong@cdjrlc.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20210817121106.44189-1-wangborong@cdjrlc.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-18octeontx2-pf: Allow VLAN priority also in ntuple filtersSubbaraya Sundeep1-5/+0
VLAN TCI is a 16 bit field which includes Priority(3 bits), CFI(1 bit) and VID(12 bits). Currently ntuple filters support installing rules to steer packets based on VID only. This patch extends that support such that filters can be installed for entire VLAN TCI. Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-18ixgbe, xsk: clean up the resources in ixgbe_xsk_pool_enable error pathWang Hai1-1/+4
In ixgbe_xsk_pool_enable(), if ixgbe_xsk_wakeup() fails, We should restore the previous state and clean up the resources. Add the missing clear af_xdp_zc_qps and unmap dma to fix this bug. Fixes: d49e286d354e ("ixgbe: add tracking of AF_XDP zero-copy state for each queue pair") Fixes: 4a9b32f30f80 ("ixgbe: fix potential RX buffer starvation for AF_XDP") Signed-off-by: Wang Hai <wanghai38@huawei.com> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20210817203736.3529939-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-18PCI: Remove reset_fn field from pci_devAmey Narkhede1-1/+1
"reset_fn" indicates whether the device supports any reset mechanism. Remove the use of reset_fn in favor of the reset_methods array that tracks supported reset mechanisms of a device and their ordering. The octeon driver incorrectly used reset_fn to detect whether the device supports FLR or not. Use pcie_reset_flr() to probe whether it supports FLR. Co-developed-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/r/20210817180500.1253-5-ameynarkhede03@gmail.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
2021-08-17i40e: Fix spelling mistake "dissable" -> "disable"Colin Ian King1-1/+1
There is a spelling mistake in a dev_info message. Fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-08-17iavf: use mutexes for locking of critical sectionsStefan Assmann3-63/+56
As follow-up to the discussion with Jakub Kicinski about iavf locking being insufficient [1] convert iavf to use mutexes instead of bitops. The locking logic is kept as is, just a drop-in replacement of enum iavf_critical_section_t with separate mutexes. The only difference is that the mutexes will be destroyed before the module is unloaded. [1] https://lwn.net/ml/netdev/20210316150210.00007249%40intel.com/ Signed-off-by: Stefan Assmann <sassmann@kpanic.de> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-08-17net: qlcnic: add missed unlock in qlcnic_83xx_flash_read32Dinghao Liu1-1/+3
qlcnic_83xx_unlock_flash() is called on all paths after we call qlcnic_83xx_lock_flash(), except for one error path on failure of QLCRD32(), which may cause a deadlock. This bug is suggested by a static analysis tool, please advise. Fixes: 81d0aeb0a4fff ("qlcnic: flash template based firmware reset recovery") Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn> Link: https://lore.kernel.org/r/20210816131405.24024-1-dinghao.liu@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-17octeontx2-af: configure npc for cn10k to allow packets from cptVidya1-1/+11
On CN10K, the higher bits in the channel number represents the CPT channel number. Mask out these higher bits in the npc configuration to allow packets from cpt for parsing. Signed-off-by: Vidya <vvelumuri@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17octeontx2-af: cn10K: Get NPC counters valueHariprasad Kelam2-8/+20
The way SW can identify the number NPC counters supported by silicon has changed for CN10K. This patch addresses this reading appropriate registers to find out number of counters available. Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17octeontx2-af: Allocate low priority entries for PFSubbaraya Sundeep1-0/+12
If the mcam entry allocation request is from PF and NOT a priority allocation request then allocate low priority entries so that PF entries always have lower priority than its VFs. This is required so that entries with (base) MCAM match criteria have lower priority compared to entries with (base + additional) match criteria. This patch considers only best case scenario where PF entries are allocated from low priority zone if low priority zone has free space. There are worst case scenarios like: 1. VFs allocating hundreds of MCAM entries leading to VFs using all mid priority zone and low priority zone entries hence no entries free from low priority zone for PF. 2. All the PFs and VFs in the system allocating and freeing entries causing fragmentation in MCAM space and all the entries requested by PF could not fit in low priority zone for allocation. This patch do not handle worst case scenarios. Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17octeontx2-pf: devlink params support to set mcam entry countSunil Goutham8-17/+221
Added support for setting or modifying MCAM entry count at runtime via devlink params. commands: devlink dev param show pci/0002:02:00.0: name mcam_count type driver-specific values: cmode runtime value 16 devlink dev param set pci/0002:02:00.0 name mcam_count value 64 cmode runtime Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17octeontx2-pf: Unify flow management variablesSunil Goutham4-46/+80
Variables used for TC flow management like maximum number of flows, number of flows installed etc are a copy of ntuple flow management variables. Since both TC and NTUPLE are not supported at the same time, it's better to unify these with common variables. This patch addresses this unification and also does cleanup of other minor stuff wrt TC. Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17octeontx2-pf: Sort the allocated MCAM entry indicesSunil Goutham1-0/+15
Per single mailbox request a maximum of 256 MCAM entries can be allocated. If more than 256 are being allocated, then the mcam indices in the final list could get jumbled. Hence sort the indices. Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17octeontx2-pf: Ntuple filters support for VF netdevRakesh Babu5-60/+98
Add packet flow classification support for both LMAC mapped virtual functions and loopback VFs. This patch adds supports for ntuple offload feature. Signed-off-by: Rakesh Babu <rsaladi2@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17octeontx2-pf: Enable NETIF_F_RXALL support for VF driverSunil Goutham2-4/+4
Enabled NETIF_F_RXALL support for VF driver. Also removed MTU range comments which are no longer valid. Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17octeontx2-af: Add debug messages for failuresSunil Goutham1-19/+73
Added debug messages for various failures during probe. This will help in quickly identifying the API where the failure is happening. Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17octeontx2-af: add proper return codes for AF mailbox handlersNaveen Mamindlapalli4-21/+47
Add appropriate error codes to be used when returning from AF mailbox handlers due to some error condition. Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17octeontx2-af: Modify install flow error codesSubbaraya Sundeep2-8/+15
When installing a flow using npc_install_flow mailbox there are number of reasons to reject the request like caller is not permitted, invalid channel specified in request, flow not supported in extraction profile and so on. Hence define new error codes for npc flows and use them instead of generic error codes. Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17Merge tag 'mlx5-updates-2021-08-16' of ↵David S. Miller19-716/+1696
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2021-08-16 The following patchset provides two separate mlx5 updates 1) Ethtool RSS context and MQPRIO channel mode support: 1.1) enable mlx5e netdev driver to allow creating Transport Interface RX (TIRs) objects on the fly to be used for ethtool RSS contexts and TX MQPRIO channel mode 1.2) Introduce mlx5e_rss object to manage such TIRs. 1.3) Ethtool support for RSS context 1.4) Support MQPRIO channel mode 2) Bridge offloads Lag support: to allow adding bond net devices to mlx5 bridge 2.1) Address bridge port by (vport_num, esw_owner_vhca_id) pair since vport_num is only unique per eswitch and in lag mode we need to manage ports from both eswitches. 2.2) Allow connectivity between representors of different eswitch instances that are attached to same bridge 2.3) Bridge LAG, Require representors to be in shared FDB mode and introduce local and peer ports representors, match on paired eswitch metadata in peer FDB entries, And finally support addition/deletion and aging of peer flows. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17net/mlx5: Bridge, support LAGVlad Buslov3-48/+159
Allow adding bond net devices to mlx5 bridge with following changes: - Modify bridge representor code to obtain uplink represetor that belongs to eswitch that is registered for notification. Require representor to be in shared FDB mode. If representor is the lag master, then consider its port as local, otherwise treat it as peer. - Use devcom to match on paired eswitch metadata in peer FDB entries. This is necessary for shared FDB LAG to function since packets are always received on active eswitch instance as opposed to parent eswitch of port. - Support for deleting peer flows when receiving SWITCHDEV_FDB_DEL_TO_BRIDGE notification was implemented in one of previous patches in series. Now also implement support for handling SWITCHDEV_FDB_ADD_TO_BRIDGE which can be generated on peer by bridge update workqueue task in LAG configuration. Refresh the flow 'lastuse' timestamp to current jiffies when receiving such notification on eswitch that manages the local FDB entry. This allows peer entries to prevent ageing of the FDB. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-17net/mlx5: Bridge, allow merged eswitch connectivityVlad Buslov5-28/+112
Allow connectivity between representors of different eswitch instances that are attached to same bridge when merged_eswitch capability is enabled. Add ports of peer eswitch to bridge instance and mark them with MLX5_ESW_BRIDGE_PORT_FLAG_PEER. Mark FDBs offloaded on peer ports with MLX5_ESW_BRIDGE_FLAG_PEER flag. Such FDBs can only be aged out on their local eswitch instance, which then sends SWITCHDEV_FDB_DEL_TO_BRIDGE event. Listen to the event on mlx5 bridge implementation and delete peer FDBs in event handler. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-17net/mlx5: Bridge, extract FDB delete notification to functionVlad Buslov1-14/+13
SWITCHDEV_FDB_DEL_TO_BRIDGE notification is generated in multiple places in bridge code. Following patch in series changes the condition for the notification. Extract the notification into dedicated helper function mlx5_esw_bridge_fdb_del_notify() to only modify it in single place in the future changes. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-17net/mlx5: Bridge, identify port by vport_num+esw_owner_vhca_id pairVlad Buslov6-208/+263
Following patches in series allow traffic between vports of different eswitch instances, which requires addressing bridge port by vport_num+esw_owner_vhca_id pair since vport_num is only unique per-eswitch. As a preparation, extend struct mlx5_esw_bridge_port with 'esw_owner_vhca_id' field and use it as part of key for mlx5_esw_bridge->vports xarray. With this change we can't rely on switchdev_handle_port_obj_add() helper to get mlx5 representor from stacked device because we need specifically representor from parent eswitch that registered the callback to obtain correct esw_owner_vhca_id. The helper doesn't allow passing additional parameters to predicate function and doesn't provide access to the notifier block to obtain eswitch through br_offloads. Implement custom helpers to obtain mlx5 representor and use them in mlx5_esw_bridge_port_obj_{add|del|attr_set}() implementations. Remove direct pointer to parent bridge from struct mlx5_vport as it is no longer needed. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-17net/mlx5: Bridge, obtain core device from eswitch instead of privVlad Buslov1-4/+2
Following patches in series will pass bond device to bridge, which means the code can't assume the device is mlx5 representor. Moreover, the core device can be easily obtained from eswitch instance, so there is no reason for more complex code that obtains struct mlx5_priv from net_device in order to use its mdev. Refactor the code to use esw->dev instead of priv->mdev. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-17net/mlx5: Bridge, release bridge in same function where it is takenVlad Buslov1-7/+9
Refactor mlx5_esw_bridge_vport_link() to release the bridge instance if mlx5_esw_bridge_vport_init() returned an error instead of relying on it to release the bridge. This improves the design because object instance is taken and released in same layer and simplifies following patches that add more logic to mlx5_esw_bridge_vport_link(). Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-17net/mlx5e: Support MQPRIO channel modeTariq Toukan3-8/+102
Add support for MQPRIO channel mode, in which a partition to TCs is defined over the channels. We allow partitions with contiguous queue indices, with no holes within. We do not allow modification to the num of channels while this MQPRIO mode is active. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-17net/mlx5e: Handle errors of netdev_set_num_tc()Tariq Toukan1-6/+14
Add handling for failures in netdev_set_num_tc(). Let mlx5e_netdev_set_tcs return an int. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-17net/mlx5e: Maintain MQPRIO mode parameterTariq Toukan2-17/+28
This is in preparation for supporting MQPRIO CHANNEL mode in downstream patch, in addition to DCB mode that's supported today. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-17net/mlx5e: Abstract MQPRIO paramsTariq Toukan6-25/+37
Abstract the MQPRIO params into a struct. Use a getter for DCB mode num_tcs. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-17net/mlx5e: Support flow classification into RSS contextsTariq Toukan5-21/+131
Extend the existing flow classification support, to steer flows not only directly to a receive ring, but also into the new RSS contexts. Create needed TIR objects on demand, and hold reference on the RSS context. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-17net/mlx5e: Support multiple RSS contextsTariq Toukan5-51/+273
Add support to multiple RSS contexts. Resources of the non-default RSS contexts are allocated and created on demand. Each RSS context can be controlled and configured separately, via the implemented ethtool ops. Here we limit the num of total contexts to 16. We do not enforce any kind of new limitation over the indirection table content. More specifically, two separate contexts can be configured to fully or partially point to the same set of receive rings. The default RSS context (index 0) is created with its full set of TIRs. All other contexts are created with an empty set, then TIRs are added upon first usage when steering rules are added. We use a reference counting mechanism to make sure an RSS context is not removed before the rules pointing to it. Block ethtool set_channels operations when multiple RSS contexts exist, as currently the kernel doesn't protect against inconsistent channels configs that break non-default RSS contexts. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-17net/mlx5e: Dynamically allocate TIRs in RSS contextsTariq Toukan1-13/+56
Move from static to dynamic memory allocations for TIR. This is in preparation to supporting on-demand TIR operations in downstream patches, where every RSS context will be init with an empty set of TIRs. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-17net/mlx5e: Convert RSS to a dedicated objectTariq Toukan5-428/+604
Code related to RSS is now encapsulated into a dedicated object and put into new files en/rss.{c,h}. All usages are converted. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-17net/mlx5e: Introduce abstraction of RSS contextTariq Toukan3-73/+105
Bring all fields that define and maintain RSS behavior together into a new structure. Align all usages with this new structure. Keep it hidden within rx_res.c. This helps supporting multiple RSS contexts in downstream patch. Use dynamic allocations for the RSS context. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>