summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-05-11net: add include/net/net_debug.hEric Dumazet2-140/+152
Remove from include/linux/netdevice.h helpers that send debug/info/warnings to syslog. We plan adding more helpers in following patches. v2: added two includes, and 'struct net_device' forward declaration to avoid compile errors (kernel bots) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-11Merge tag 'mlx5-updates-2022-05-09' of ↵David S. Miller22-302/+720
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2022-05-09 1) Gavin Li, adds exit route from waiting for FW init on device boot and increases FW init timeout on health recovery flow 2) Support 4 ports HCAs LAG mode Mark Bloch Says: ================ This series adds to mlx5 drivers support for 4 ports HCAs. Starting with ConnectX-7 HCAs with 4 ports are possible. As most driver parts aren't affected by such configuration most driver code is unchanged. Specially the only affected areas are: - Lag - Devcom - Merged E-Switch - Single FDB E-Switch Lag was chosen to be converted first. Creating hardware LAG when all 4 ports are added to the same bond device. Devom, merge E-Switch and single FDB E-Switch, are marked as supporting only 2 ports HCAs and future patches will add support for 4 ports HCAs. In order to activate the hardware lag a user can execute the: ip link add bond0 type bond ip link set bond0 type bond miimon 100 mode 2 ip link set eth2 master bond0 ip link set eth3 master bond0 ip link set eth4 master bond0 ip link set eth5 master bond0 Where eth2, eth3, eth4 and eth5 are the PFs of the same HCA. ================ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-11Merge branch 'net-phy-add-comments-for-lan8742-phy-support'Jakub Kicinski2-1/+9
Yuiko Oshino says: ==================== net: phy: add comments for LAN8742 phy support Add comments for 0xfffffff2 phy ID mask for the LAN8742 and the LAN88xx, explaining that they can coexist and allow future hardware revisions. Also add one missing tab in smsc.c. ==================== Link: https://lore.kernel.org/r/20220509185804.7147-1-yuiko.oshino@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-11net: phy: smsc: add comments for the LAN8742 phy ID mask.Yuiko Oshino1-1/+5
add comments for the LAN8742 phy ID mask in the previous patch. add one missing tab in the LAN8742 phy ID line. Signed-off-by: Yuiko Oshino <yuiko.oshino@microchip.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-11net: phy: microchip: add comments for the modified LAN88xx phy ID mask.Yuiko Oshino1-0/+4
add comments for the updated LAN88xx phy ID mask in the previous patch. Signed-off-by: Yuiko Oshino <yuiko.oshino@microchip.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-11dt-bindings: net: orion-mdio: Convert to JSON schemaChris Packham2-54/+60
Convert the marvell,orion-mdio binding to JSON schema. Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20220505210621.3637268-1-chris.packham@alliedtelesis.co.nz Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-11Merge branch 'docs-document-some-aspects-of-struct-sk_buff'Jakub Kicinski3-107/+232
Jakub Kicinski says: ==================== docs: document some aspects of struct sk_buff This small set creates a place to render sk_buff documentation, documents one random thing (data-only skbs) and converts the big checksum comment to kdoc. ==================== Link: https://lore.kernel.org/r/20220323233715.2104106-1-kuba@kernel.org/ Link: https://lore.kernel.org/r/20220324231312.2241166-1-kuba@kernel.org/ Link: https://lore.kernel.org/r/20220509160456.1058940-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-11skbuff: render the checksum comment to documentationJakub Kicinski2-95/+130
Long time ago Tom added a giant comment to skbuff.h explaining checksums. Now that we have a place in Documentation for skbuff docs we should render it. Sprinkle some markup while at it. Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-11skbuff: rewrite the doc for data-only skbsJakub Kicinski3-12/+37
The comment about shinfo->dataref split is really unhelpful, at least to me. Rewrite it and render it to skb documentation. Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-11skbuff: add a basic intro docJakub Kicinski2-0/+65
Add basic skb documentation. It's mostly an intro to the subsequent patches - it would looks strange if we documented advanced topics without covering the basics in any way. Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-11net: fix kdoc on __dev_queue_xmit()Jakub Kicinski1-20/+15
Commit c526fd8f9f4f21 ("net: inline dev_queue_xmit()") exported __dev_queue_xmit(), now it's being rendered in html docs, triggering: Documentation/networking/kapi:92: net/core/dev.c:4101: WARNING: Missing matching underline for section title overline. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Link: https://lore.kernel.org/linux-next/20220503073420.6d3f135d@canb.auug.org.au/ Fixes: c526fd8f9f4f21 ("net: inline dev_queue_xmit()") Link: https://lore.kernel.org/r/20220509170412.1069190-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-11Merge branch 'move-siena-into-a-separate-subdirectory'Jakub Kicinski48-122/+24114
Martin Habets says: ==================== Move Siena into a separate subdirectory The Siena NICs (SFN5000 and SFN6000 series) went EOL in November 2021. Most of these adapters have been remove from our test labs, and testing has been reduced to a minimum. This patch series creates a separate kernel module for the Siena architecture, analogous to what was done for Falcon some years ago. This reduces our maintenance for the sfc.ko module, and allows us to enhance the EF10 and EF100 drivers without the risk of breaking Siena NICs. After this series further enhancements are needed to differentiate the new kernel module from sfc.ko, and the Siena code can be removed from sfc.ko. Thes will be posted as a small follow-up series. The Siena module is not built by default, but can be enabled using Kconfig option SFC_SIENA. This will create module sfc-siena.ko. Patches Patches 1-3 establish the code base for the Siena driver. Patches 4-10 ensure the allyesconfig build succeeds. Patch 11 adds the basic Siena module. I do not expect patch 1 through 3 to be reviewed, they are FYI only. No checkpatch issues were resolved as part of these, but they were fixed in the subsequent patches. Testing Various build tests were done such as allyesconfig, W=1 and sparse. The new sfc-siena.ko and sfc.ko modules were tested on a machine with both these NICs in them, and several tests were run on both drivers. ==================== Link: https://lore.kernel.org/r/165211018297.5289.9658523545298485394.stgit@palantir17.mph.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-11sfc: Add a basic Siena moduleMartin Habets5-3/+28
Make the (un)load message more specific to differentiate it from the sfc.ko messages. Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-11sfc/siena: Inline functions in sriov.h to avoid conflicts with sfcMartin Habets2-77/+63
The implementation of each is quite short. This means sriov.c is not needed any more. Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-11sfc/siena: Rename functions in nic_common.h to avoid conflicts with sfcMartin Habets14-138/+95
For siena use efx_siena_ as the function prefix. efx_nic_update_stats_atomic is only used in efx_common.c, so move it there. efx_nic_copy_stats is not used in Siena, so it is removed. Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-11sfc/siena: Rename functions in mcdi headers to avoid conflicts with sfcMartin Habets17-609/+459
For siena use efx_siena_ as the function prefix. Several functions are not used in Siena, so they are removed. Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-11sfc/siena: Rename peripheral functions to avoid conflicts with sfcMartin Habets15-270/+270
For siena use efx_siena_ as the function prefix. This patch covers selftest.h, ptp.h, net_driver.h and ethtool_common.h. efx_ethtool_fill_self_tests() can become static. Some functions in ptp.c can also become static. Rename loopback_mode in net_driver.h. Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-11sfc/siena: Rename RX/TX functions to avoid conflicts with sfcMartin Habets13-233/+216
For siena use efx_siena_ as the function prefix. Several functions are not used in Siena, so they are removed. Use a Siena specific variable name for module parameter efx_separate_tx_channels. Move efx_fini_tx_queue() to avoid a forward declaration of efx_dequeue_buffer(). Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-11sfc/siena: Rename functions in efx headers to avoid conflicts with sfcMartin Habets23-472/+427
When building with allyesconfig there are many identical symbol names. For siena use efx_siena_ as the function and variable prefix to avoid build errors. efx_mtd_remove_partition can become static as it is no longer called from other files. efx_ticks_to_usecs and efx_xmit_done_single are not used in Siena, so they are removed. Several functions are only used inside efx_channels.c for Siena so they can become static. Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-11sfc/siena: Remove build references to missing functionalityMartin Habets10-458/+17
Functionality not supported or needed on Siena includes: - Anything for EF100 - EF10 specifics such as register access, PIO and TSO offload. Also only bind to Siena NICs. Remove EF10 specifics from nic.h. The functions that start with efx_farch_ will be removed from sfc.ko with a subsequent patch. Add the efx_ prefix to siena_prepare_flush() to make it consistent with the other APIs. Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-11sfc: Copy shared files needed for Siena (part 2)Martin Habets27-0/+14153
These are the files starting with m through w. No changes are done, those will be done with subsequent commits. Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-11sfc: Copy shared files needed for Siena (part 1)Martin Habets14-0/+10524
These are the files starting with b through i. No changes are done, those will be done with subsequent commits. Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-11sfc: Move Siena specific filesMartin Habets4-0/+0
Files are only moved, no changes are made. Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-11nfp: flower: fix 'variable 'flow6' set but not used'Louis Peens1-12/+7
Kernel test robot reported an issue after a recent patch about an unused variable when CONFIG_IPV6 is disabled. Move the variable declaration to be inside the #ifdef, and do a bit more cleanup. There is no need to use a temporary ipv6 bool value, it is just checked once, remove the extra variable and just do the check directly. Fixes: 9d5447ed44b5 ("nfp: flower: fixup ipv6/ipv4 route lookup for neigh events") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20220510074845.41457-1-simon.horman@corigine.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-10x25: remove redundant pointer devColin Ian King1-2/+1
Pointer dev is being assigned a value that is never used, the assignment and the variable are redundant and can be removed. Also replace null check with the preferred !ptr idiom. Cleans up clang scan warning: net/x25/x25_proc.c:94:26: warning: Although the value stored to 'dev' is used in the enclosing expression, the value is never actually read from 'dev' [deadcode.DeadStores] Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://lore.kernel.org/r/20220508214500.60446-1-colin.i.king@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-10Merge branch 'this-is-a-patch-series-for-ethernet-driver-of-sunplus-sp7021-soc'Paolo Abeni19-0/+2191
Wells Lu says: ==================== This is a patch series for Ethernet driver of Sunplus SP7021 SoC. Sunplus SP7021 is an ARM Cortex A7 (4 cores) based SoC. It integrates many peripherals (ex: UART, I2C, SPI, SDIO, eMMC, USB, SD card and etc.) into a single chip. It is designed for industrial control applications. Refer to: https://sunplus.atlassian.net/wiki/spaces/doc/overview https://tibbo.com/store/plus1.html ==================== Link: https://lore.kernel.org/r/1652004800-3212-1-git-send-email-wellslutw@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-10net: ethernet: Add driver for Sunplus SP7021Wells Lu18-0/+2043
Add driver for Sunplus SP7021 SoC. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Wells Lu <wellslutw@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-10devicetree: bindings: net: Add bindings doc for Sunplus SP7021.Wells Lu2-0/+148
Add bindings documentation for Sunplus SP7021 SoC. Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Wells Lu <wellslutw@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-10Merge branch ↵Paolo Abeni11-63/+282
'ptp-support-hardware-clocks-with-additional-free-running-cycle-counter' Gerhard Engleder says: ==================== ptp: Support hardware clocks with additional free running cycle counter ptp vclocks require a clock with free running time for the timecounter. Currently only a physical clock forced to free running is supported. If vclocks are used, then the physical clock cannot be synchronized anymore. The synchronized time is not available in hardware in this case. As a result, timed transmission with TAPRIO hardware support is not possible anymore. If hardware would support a free running time additionally to the physical clock, then the physical clock does not need to be forced to free running. Thus, the physical clocks can still be synchronized while vclocks are in use. The physical clock could be used to synchronize the time domain of the TSN network and trigger TAPRIO. In parallel vclocks can be used to synchronize other time domains. One year ago I thought for two time domains within a TSN network also two physical clocks are required. This would lead to new kernel interfaces for asking for the second clock, ... . But actually for a time triggered system like TSN there can be only one time domain that controls the system itself. All other time domains belong to other layers, but not to the time triggered system itself. So other time domains can be based on a free running counter if similar mechanisms like 2 step synchroisation are used. Synchronisation was tested with two time domains between two directly connected hosts. Each host run two ptp4l instances, the first used the physical clock and the second used the virtual clock. I used my FPGA based network controller as network device. ptp4l was used in combination with the virtual clock support patches from Miroslav Lichvar. v4: - if_index of 0 is invalid (Jonathan Lemon) - set if_index to 0 in the SOF_TIMESTAMPING_RAW_HARDWARE block (Jonathan Lemon) - add helper function for netdev_get_tstamp() call (Jonathan Lemon) - update SKBTX_ANY_TSTAMP (Paolo Abeni) - use separate bits for new tx_flags (Richard Cochran) v3: - optimize ptp_convert_timestamp (Richard Cochran) - call dev_get_by_napi_id() only if needed (Richard Cochran) - use non-negated logical test (Richard Cochran) - add comment for skipped output (Richard Cochran) - add comment for SKBTX_HW_TSTAMP_USE_CYCLES masking (Richard Cochran) v2: - rename ptp_clock cycles to has_cycles (Richard Cochran) - call it free running cycle counter (Richard Cochran) - update struct skb_shared_hwtstamps kdoc (Richard Cochran) - optimize timestamp address/cookie processing path (Richard Cochran, Vinicius Costa Gomes) v1: - complete rework based on suggestions (Richard Cochran) ==================== Link: https://lore.kernel.org/r/20220506200142.3329-1-gerhard@engleder-embedded.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-10tsnep: Add free running cycle counter supportGerhard Engleder3-7/+63
The TSN endpoint Ethernet MAC supports a free running counter additionally to its clock. This free running counter can be read and hardware timestamps are supported. As the name implies, this counter cannot be set and its frequency cannot be adjusted. Add free running cycle counter support based on this free running counter to physical clock. This also requires hardware time stamps based on that free running counter. Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-10ptp: Speed up vclock lookupGerhard Engleder2-19/+48
ptp_convert_timestamp() is called in the RX path of network messages. The current implementation takes ~5000ns on 1.2GHz A53. This is too much for the hot path of packet processing. Introduce hash table for fast vclock lookup in ptp_convert_timestamp(). The execution time of ptp_convert_timestamp() is reduced to ~700ns on 1.2GHz A53. Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-10ptp: Support late timestamp determinationGerhard Engleder3-13/+71
If a physical clock supports a free running cycle counter, then timestamps shall be based on this time too. For TX it is known in advance before the transmission if a timestamp based on the free running cycle counter is needed. For RX it is impossible to know which timestamp is needed before the packet is received and assigned to a socket. Support late timestamp determination by a network device. Therefore, an address/cookie is stored within the new netdev_data field of struct skb_shared_hwtstamps. This address/cookie is provided to a new network device function called ndo_get_tstamp(), which returns a timestamp based on the normal/adjustable time or based on the free running cycle counter. If function is not supported, then timestamp handling is not changed. This mechanism is intended for RX, but TX use is also possible. Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-10ptp: Pass hwtstamp to ptp_convert_timestamp()Gerhard Engleder3-8/+6
ptp_convert_timestamp() converts only the timestamp hwtstamp, which is a field of the argument with the type struct skb_shared_hwtstamps *. So a pointer to the hwtstamp field of this structure is sufficient. Rework ptp_convert_timestamp() to use an argument of type ktime_t *. This allows to add additional timestamp manipulation stages before the call of ptp_convert_timestamp(). Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-10ptp: Request cycles for TX timestampGerhard Engleder2-2/+16
The free running cycle counter of physical clocks called cycles shall be used for hardware timestamps to enable synchronisation. Introduce new flag SKBTX_HW_TSTAMP_USE_CYCLES, which signals driver to provide a TX timestamp based on cycles if cycles are supported. Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-10ptp: Add cycles support for virtual clocksGerhard Engleder5-16/+80
ptp vclocks require a free running time for their timecounter. Currently only a physical clock forced to free running is supported. If vclocks are used, then the physical clock cannot be synchronized anymore. The synchronized time is not available in hardware in this case. As a result, timed transmission with TAPRIO hardware support is not possible anymore. If hardware would support a free running time additionally to the physical clock, then the physical clock does not need to be forced to free running. Thus, the physical clocks can still be synchronized while vclocks are in use. The physical clock could be used to synchronize the time domain of the TSN network and trigger TAPRIO. In parallel vclocks can be used to synchronize other time domains. Introduce support for a free running cycle counter called cycles to physical clocks. Rework ptp vclocks to use this free running cycle counter. Default implementation is based on time of physical clock. Thus, behavior of ptp vclocks based on physical clocks without free running cycle counter is identical to previous behavior. Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-10eth: dpaa2-mac: remove a dead-code NULL check on fwnode parentJakub Kicinski1-3/+0
Since commit 4e30e98c4b4c ("dpaa2-mac: return -EPROBE_DEFER from dpaa2_mac_open in case the fwnode is not set") @parent can't be NULL after the if. It's either the address of the ->fwnode of @dpmacs or @fwnode in case of ACPI. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20220506200029.852310-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-10net/mlx5: Lag, add debugfs to query hardware lag stateMark Bloch5-4/+192
Lag state has become very complicated with many modes, flags, types and port selections methods and future work will add additional features. Add a debugfs to query the current lag state. A new directory named "lag" will be created under the mlx5 debugfs directory. As the driver has debugfs per pci function the location will be: <debugfs>/mlx5/<BDF>/lag For example: /sys/kernel/debug/mlx5/0000:08:00.0/lag The following files are exposed: - state: Returns "active" or "disabled". If "active" it means hardware lag is active. - members: Returns the BDFs of all the members of lag object. - type: Returns the type of the lag currently configured. Valid only if hardware lag is active. * "roce" - Members are bare metal PFs. * "switchdev" - Members are in switchdev mode. * "multipath" - ECMP offloads. - port_sel_mode: Returns the egress port selection method, valid only if hardware lag is active. * "queue_affinity" - Egress port is selected by the QP/SQ affinity. * "hash" - Egress port is selected by hash done on each packet. Controlled by: xmit_hash_policy of the bond device. - flags: Returns flags that are specific per lag @type. Valid only if hardware lag is active. * "shared_fdb" - "on" or "off", if "on" single FDB is used. - mapping: Returns the mapping which is used to select egress port. Valid only if hardware lag is active. If @port_sel_mode is "hash" returns the active egress ports. The hash result will select only active ports. if @port_sel_mode is "queue_affinity" returns the mapping between the configured port affinity of the QP/SQ and actual egress port. For example: * 1:1 - Mapping means if the configured affinity is port 1 traffic will egress via port 1. * 1:2 - Mapping means if the configured affinity is port 1 traffic will egress via port 2. This can happen if port 1 is down or in active/backup mode and port 1 is backup. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-10net/mlx5: Lag, use buckets in hash modeMark Bloch4-76/+182
When in hardware lag and the NIC has more than 2 ports when one port goes down need to distribute the traffic between the remaining active ports. For better spread in such cases instead of using 1-to-1 mapping and only 4 slots in the hash, use many. Each port will have many slots that point to it. When a port goes down go over all the slots that pointed to that port and spread them between the remaining active ports. Once the port comes back restore the default mapping. We will have number_of_ports * MLX5_LAG_MAX_HASH_BUCKETS slots. Each MLX5_LAG_MAX_HASH_BUCKETS belong to a different port. The native mapping is such that: port 1: The first MLX5_LAG_MAX_HASH_BUCKETS slots are: [1, 1, .., 1] which means if a packet is hased into one of this slots it will hit the wire via port 1. port 2: The second MLX5_LAG_MAX_HASH_BUCKETS slots are: [2, 2, .., 2] which means if a packet is hased into one of this slots it will hit the wire via port2. and this mapping is the same of the rest of the ports. On a failover, lets say port 2 goes down (port 1, 3, 4 are still up). the new mapping for port 2 will be: port 2: The second MLX5_LAG_MAX_HASH_BUCKETS are: [1, 3, 1, 4, .., 4] which means the mapping was changed from the native mapping to a mapping that consists of only the active ports. With this if a port goes down the traffic will be split between the active ports randomly Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-10net/mlx5: Lag, refactor dmesg printMark Bloch1-10/+12
Combine dmesg lag prints into a single function. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-10net/mlx5: Support devices with more than 2 portsMark Bloch3-3/+5
Increase the define MLX5_MAX_PORTS to 4 as the driver is ready to support NICs with 4 ports. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-10net/mlx5: Lag, use actual number of lag portsMark Bloch3-149/+216
Refactor the entire lag code to use ldev->ports instead of hard-coded defines (like MLX5_MAX_PORTS) for its operations. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-10net/mlx5: Lag, use hash when in roce lag on 4 portsMark Bloch1-9/+36
Downstream patches will add support for lag over 4 ports. In that mode we will only use hash as the uplink selection method. Using hash instead of queue affinity (before this patch) offers key advantages like: - Align ports selection method with the method used by the bond device - Better packets distribution where a single queue can transmit from multiple ports (with queue affinity a queue is bound to a single port regardless of the packet being sent). - In case of failover we traffic is split between multiple ports and not a single one like in queue affinity. Going forward it was decided that queue affinity will be deprecated as using hash provides a better user experience which means on 4 ports HCAs hash will always be used. Future work will add hash support for 2 ports HCAs as well. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-10net/mlx5: Lag, support single FDB only on 2 portsMark Bloch1-0/+4
E-Switch currently doesn't support more than 2 E-Switch managers being aggregated under a single hardware lag. Have specific checks to disallow creating lag when the code doesn't support it. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-10net/mlx5: Lag, store number of ports inside lag objectMark Bloch2-0/+2
Store the number of lag ports inside the lag object. Lag object is a single shared object managing the lag state of multiple mlx5 devices on the same physical HCA. Downstream patches will allow hardware lag to be created over devices with more than 2 ports. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-10net/mlx5: Lag, filter non compatible devicesMark Bloch3-14/+47
When search for a peer lag device we can filter based on that device's capabilities. Downstream patch will be less strict when filtering compatible devices and remove the limitation where we require exact MLX5_MAX_PORTS and change it to a range. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-10net/mlx5: Lag, use lag lockMark Bloch4-65/+35
Use a lag specific lock instead of depending on external locks to synchronise the lag creation/destruction. With this, taking E-Switch mode lock is no longer needed for syncing lag logic. Cleanup any dead code that is left over and don't export functions that aren't used outside the E-Switch core code. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-10net/mlx5: Lag, move E-Switch prerequisite check into lag codeMark Bloch3-16/+9
There is no need to expose E-Switch function for something that can be checked with already present API inside lag code. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-10net/mlx5: devcom only supports 2 portsMark Bloch2-7/+11
Devcom API is intended to be used between 2 devices only add this implied assumption into the code and check when it's no true. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-10net/mlx5: Lag, expose number of lag portsMark Bloch6-2/+11
Downstream patches will add support for hardware lag with more than 2 ports. Add a way for users to query the number of lag ports. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-10net/mlx5: Increase FW pre-init timeout for health recoveryGavin Li6-13/+20
Currently, health recovery will reload driver to recover it from fatal errors. During the driver's load process, it would wait for FW to set the pre-init bit for up to 120 seconds, beyond this threshold it would abort the load process. In some cases, such as a FW upgrade on the DPU, this timeout period is insufficient, and the user has no way to recover the host device. To solve this issue, introduce a new FW pre-init timeout for health recovery, which is set to 2 hours. The timeout for devlink reload and probe will use the original one because they are user triggered flows, and therefore should not have a significantly long timeout, during which the user command would hang. Signed-off-by: Gavin Li <gavinl@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>