summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet
AgeCommit message (Collapse)AuthorFilesLines
2023-06-08net/mlx5: LAG, block multiport eswitch LAG in case ldev have more than 2 portsShay Drory1-0/+4
multiport eswitch LAG is not supported over more than two ports. Add a check in order to block multiport eswitch LAG over such devices. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-08net/mlx5: LAG, block multipath LAG in case ldev have more than 2 portsShay Drory1-0/+4
multipath LAG is not supported over more than two ports. Add a check in order to block multipath LAG over such configurations. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-08net/mlx5: LAG, change mlx5_shared_fdb_supported() to staticShay Drory2-2/+1
mlx5_shared_fdb_supported() is used only in a single file. Change the function to be static. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-08net/mlx5: LAG, generalize handling of shared FDBShay Drory1-28/+38
Shared FDB handling is using the assumption that shared FDB can only be created from two devices. In order to support shared FDB of more than two devices, iterate over all LAG ports instead of hard coding only the first two LAG ports whenever handling shared FDB. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-08net/mlx5: LAG, check if all eswitches are paired for shared FDBShay Drory2-1/+12
Shared FDB LAG can only work if all eswitches are paired. Also, whenever two eswitches are paired, devcom is marked as ready. Therefore, in case of device with two eswitches, checking devcom was sufficient. However, this is not correct for device with more than two eswitches, which will be introduced in downstream patch. Hence, check all eswitches are paired explicitly. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-08{net/RDMA}/mlx5: introduce lag_for_each_peerShay Drory2-14/+31
Introduce a generic APIs to iterate over all the devices which are part of the LAG. This API replace mlx5_lag_get_peer_mdev() which retrieve only a single peer device from the lag. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-07net: dwmac_socfpga: initialize local data for mdio regmap configurationMaxime Chevallier1-6/+9
Explicitly zero-ize the local mdio_regmap_config data, and explicitly set the .autoscan parameter, as we only have a PCS on this bus. Fixes: 5d1f3fe7d2d5 ("net: stmmac: dwmac-sogfpga: use the lynx pcs driver") Suggested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Suggested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-07net: altera_tse: explicitly disable autoscan on the regmap-mdio busMaxime Chevallier1-0/+1
Set the .autoscan flag to false on the regmap-mdio bus, to avoid using a random uninitialized value. We don't want autoscan in this case as the mdio device is a PCS and not a PHY. Fixes: db48abbaa18e ("net: ethernet: altera-tse: Convert to mdio-regmap and use PCS Lynx") Suggested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-07net: stmmac: make the pcs_lynx cleanup sequence specific to dwmac_socfpgaMaxime Chevallier3-5/+13
So far, only the dwmac_socfpga variant of stmmac uses PCS Lynx. Use a dedicated cleanup sequence for dwmac_socfpga instead of using the generic stmmac one. Fixes: 5d1f3fe7d2d5 ("net: stmmac: dwmac-sogfpga: use the lynx pcs driver") Suggested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-07net: altera_tse: Use the correct Kconfig option for the PCS_LYNX dependencyMaxime Chevallier1-1/+1
Use the correct Kconfig dependency for altera_tse as PCS_ALTERA_TSE was replaced by PCS_LYNX. Fixes: db48abbaa18e ("net: ethernet: altera-tse: Convert to mdio-regmap and use PCS Lynx") Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-07net: altera-tse: Initialize local structs before using itMaxime Chevallier1-0/+2
The regmap_config and mdio_regmap_config objects needs to be zeroed before using them. This will cause spurious errors at probe time as config->pad_bits is containing random uninitialized data. Fixes: db48abbaa18e ("net: ethernet: altera-tse: Convert to mdio-regmap and use PCS Lynx") Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-07net: txgbe: Avoid passing uninitialised parameter to pci_wake_from_d3()Simon Horman1-5/+3
txgbe_shutdown() relies on txgbe_dev_shutdown() to initialise wake by passing it by reference. However, txgbe_dev_shutdown() doesn't use this parameter at all. wake is then passed uninitialised by txgbe_dev_shutdown() to pci_wake_from_d3(). Resolve this problem by: * Removing the unused parameter from txgbe_dev_shutdown() * Removing the uninitialised variable wake from txgbe_dev_shutdown() * Passing false to pci_wake_from_d3() - this assumes that although uninitialised wake was in practice false (0). I'm not sure that this counts as a bug, as I'm not sure that it manifests in any unwanted behaviour. But in any case, the issue was introduced by: 3ce7547e5b71 ("net: txgbe: Add build support for txgbe") Flagged by Smatch as: .../txgbe_main.c:486 txgbe_shutdown() error: uninitialized symbol 'wake'. No functional change intended. Compile tested only. Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-07net: liquidio: fix mixed module-builtin objectMasahiro Yamada14-3/+89
With CONFIG_LIQUIDIO=m and CONFIG_LIQUIDIO_VF=y (or vice versa), $(common-objs) are linked to a module and also to vmlinux even though the expected CFLAGS are different between builtins and modules. This is the same situation as fixed by commit 637a642f5ca5 ("zstd: Fixing mixed module-builtin objects"). Introduce the new module, liquidio-core, to provide the common functions to liquidio and liquidio-vf. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-07ice: make writes to /dev/gnssX synchronousMichal Schmidt4-72/+6
The current ice driver's GNSS write implementation buffers writes and works through them asynchronously in a kthread. That's bad because: - The GNSS write_raw operation is supposed to be synchronous[1][2]. - There is no upper bound on the number of pending writes. Userspace can submit writes much faster than the driver can process, consuming unlimited amounts of kernel memory. A patch that's currently on review[3] ("[v3,net] ice: Write all GNSS buffers instead of first one") would add one more problem: - The possibility of waiting for a very long time to flush the write work when doing rmmod, softlockups. To fix these issues, simplify the implementation: Drop the buffering, the write_work, and make the writes synchronous. I tested this with gpsd and ubxtool. [1] https://events19.linuxfoundation.org/wp-content/uploads/2017/12/The-GNSS-Subsystem-Johan-Hovold-Hovold-Consulting-AB.pdf "User interface" slide. [2] A comment in drivers/gnss/core.c:gnss_write(): /* Ignoring O_NONBLOCK, write_raw() is synchronous. */ [3] https://patchwork.ozlabs.org/project/intel-wired-lan/patch/20230217120541.16745-1-karol.kolacinski@intel.com/ Fixes: d6b98c8d242a ("ice: add write functionality for GNSS TTY") Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-07pds_core: Fix FW recovery detectionBrett Creeley1-2/+8
Commit 523847df1b37 ("pds_core: add devcmd device interfaces") included initial support for FW recovery detection. Unfortunately, the ordering in pdsc_is_fw_good() was incorrect, which was causing FW recovery to be undetected by the driver. Fix this by making sure to update the cached fw_status by calling pdsc_is_fw_running() before setting the local FW gen. Fixes: 523847df1b37 ("pds_core: add devcmd device interfaces") Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230605195116.49653-1-brett.creeley@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-06qed/qede: Fix scheduling while atomicManish Chopra4-4/+60
Statistics read through bond interface via sysfs causes below bug and traces as it triggers the bonding module to collect the slave device statistics while holding the spinlock, beneath that qede->qed driver statistics flow gets scheduled out due to usleep_range() used in PTT acquire logic [ 3673.988874] Hardware name: HPE ProLiant DL365 Gen10 Plus/ProLiant DL365 Gen10 Plus, BIOS A42 10/29/2021 [ 3673.988878] Call Trace: [ 3673.988891] dump_stack_lvl+0x34/0x44 [ 3673.988908] __schedule_bug.cold+0x47/0x53 [ 3673.988918] __schedule+0x3fb/0x560 [ 3673.988929] schedule+0x43/0xb0 [ 3673.988932] schedule_hrtimeout_range_clock+0xbf/0x1b0 [ 3673.988937] ? __hrtimer_init+0xc0/0xc0 [ 3673.988950] usleep_range+0x5e/0x80 [ 3673.988955] qed_ptt_acquire+0x2b/0xd0 [qed] [ 3673.988981] _qed_get_vport_stats+0x141/0x240 [qed] [ 3673.989001] qed_get_vport_stats+0x18/0x80 [qed] [ 3673.989016] qede_fill_by_demand_stats+0x37/0x400 [qede] [ 3673.989028] qede_get_stats64+0x19/0xe0 [qede] [ 3673.989034] dev_get_stats+0x5c/0xc0 [ 3673.989045] netstat_show.constprop.0+0x52/0xb0 [ 3673.989055] dev_attr_show+0x19/0x40 [ 3673.989065] sysfs_kf_seq_show+0x9b/0xf0 [ 3673.989076] seq_read_iter+0x120/0x4b0 [ 3673.989087] new_sync_read+0x118/0x1a0 [ 3673.989095] vfs_read+0xf3/0x180 [ 3673.989099] ksys_read+0x5f/0xe0 [ 3673.989102] do_syscall_64+0x3b/0x90 [ 3673.989109] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 3673.989115] RIP: 0033:0x7f8467d0b082 [ 3673.989119] Code: c0 e9 b2 fe ff ff 50 48 8d 3d ca 05 08 00 e8 35 e7 01 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24 [ 3673.989121] RSP: 002b:00007ffffb21fd08 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 3673.989127] RAX: ffffffffffffffda RBX: 000000000100eca0 RCX: 00007f8467d0b082 [ 3673.989128] RDX: 00000000000003ff RSI: 00007ffffb21fdc0 RDI: 0000000000000003 [ 3673.989130] RBP: 00007f8467b96028 R08: 0000000000000010 R09: 00007ffffb21ec00 [ 3673.989132] R10: 00007ffffb27b170 R11: 0000000000000246 R12: 00000000000000f0 [ 3673.989134] R13: 0000000000000003 R14: 00007f8467b92000 R15: 0000000000045a05 [ 3673.989139] CPU: 30 PID: 285188 Comm: read_all Kdump: loaded Tainted: G W OE Fix this by collecting the statistics asynchronously from a periodic delayed work scheduled at default stats coalescing interval and return the recent copy of statisitcs from .ndo_get_stats64(), also add ability to configure/retrieve stats coalescing interval using below commands - ethtool -C ethx stats-block-usecs <val> ethtool -c ethx Fixes: 133fac0eedc3 ("qede: Add basic ethtool support") Cc: Sudarsana Kalluru <skalluru@marvell.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Manish Chopra <manishc@marvell.com> Link: https://lore.kernel.org/r/20230605112600.48238-1-manishc@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-06Merge tag 'mlx5-updates-2023-05-31' of ↵Jakub Kicinski24-271/+603
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2023-05-31 net/mlx5: Support 4 ports VF LAG, part 1/2 This series continues the series[1] "Support 4 ports HCAs LAG mode" by Mark Bloch. This series adds support for 4 ports VF LAG (single FDB E-Switch). This series of patches focuses on refactoring different sections of the code that make assumptions about VF LAG supporting only two ports. For instance, it assumes that each device can only have one peer. Patches 1-5: - Refactor ETH handling of TC rules of eswitches with peers. Patch 6: - Refactors peer miss group table. Patches 7-9: - Refactor single FDB E-Switch creation. Patch 10: - Refactor the DR layer. Patches 11-14: - Refactors devcom layer. Next series will refactor LAG layer and enable 4 ports VF LAG. This series specifically allows HCAs with 4 ports to create a VF LAG with only 4 ports. It is not possible to create a VF LAG with 2 or 3 ports using HCAs that have 4 ports. Currently, the Merged E-Switch feature only supports HCAs with 2 ports. However, upcoming patches will introduce support for HCAs with 4 ports. In order to activate VF LAG a user can execute: devlink dev eswitch set pci/0000:08:00.0 mode switchdev devlink dev eswitch set pci/0000:08:00.1 mode switchdev devlink dev eswitch set pci/0000:08:00.2 mode switchdev devlink dev eswitch set pci/0000:08:00.3 mode switchdev ip link add name bond0 type bond ip link set dev bond0 type bond mode 802.3ad ip link set dev eth2 master bond0 ip link set dev eth3 master bond0 ip link set dev eth4 master bond0 ip link set dev eth5 master bond0 Where eth2, eth3, eth4 and eth5 are net-interfaces of pci/0000:08:00.0 pci/0000:08:00.1 pci/0000:08:00.2 pci/0000:08:00.3 respectively. User can verify LAG state and type via debugfs: /sys/kernel/debug/mlx5/0000\:08\:00.0/lag/state /sys/kernel/debug/mlx5/0000\:08\:00.0/lag/type [1] https://lore.kernel.org/netdev/20220510055743.118828-1-saeedm@nvidia.com/ * tag 'mlx5-updates-2023-05-31' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: Devcom, extend mlx5_devcom_send_event to work with more than two devices net/mlx5: Devcom, introduce devcom_for_each_peer_entry net/mlx5: E-switch, mark devcom as not ready when all eswitches are unpaired net/mlx5: Devcom, Rename paired to ready net/mlx5: DR, handle more than one peer domain net/mlx5: E-switch, generalize shared FDB creation net/mlx5: E-switch, Handle multiple master egress rules net/mlx5: E-switch, refactor FDB miss rule add/remove net/mlx5: E-switch, enlarge peer miss group table net/mlx5e: Handle offloads flows per peer net/mlx5e: en_tc, re-factor query route port net/mlx5e: rep, store send to vport rules per peer net/mlx5e: tc, Refactor peer add/del flow net/mlx5e: en_tc, Extend peer flows to a list ==================== Link: https://lore.kernel.org/r/20230602191301.47004-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-05net: stmmac: dwmac-qcom-ethqos: fix a regression on EMAC < 3Bartosz Golaszewski1-1/+2
We must not assign plat_dat->dwmac4_addrs unconditionally as for structures which don't set them, this will result in the core driver using zeroes everywhere and breaking the driver for older HW. On EMAC < 2 the address should remain NULL. Fixes: b68376191c69 ("net: stmmac: dwmac-qcom-ethqos: Add EMAC3 support") Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Reviewed-by: Andrew Halaney <ahalaney@redhat.com> Reviewed-by: Siddharth Vadapalli <s-vadapalli@ti.com> Reviewed-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-05mlxsw: spectrum_router: Do not query MAX_VRS on each iterationPetr Machata1-4/+8
MLXSW_CORE_RES_GET involves a call to spectrum_core, a separate module. Instead of making the call on every iteration, cache it up front, and use the value. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-05mlxsw: spectrum_router: Do not query MAX_RIFS on each iterationPetr Machata1-2/+4
MLXSW_CORE_RES_GET involves a call to spectrum_core, a separate module. Instead of making the call on every iteration, cache it up front, and use the value. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-05mlxsw: spectrum_router: Use extack in mlxsw_sp~_rif_ipip_lb_configure()Petr Machata1-2/+2
In commit 26029225d992 ("mlxsw: spectrum_router: Propagate extack further"), the mlxsw_sp_rif_ops.configure callback got a new argument, extack. However the callbacks that deal with tunnel configuration, mlxsw_sp1_rif_ipip_lb_configure() and mlxsw_sp2_rif_ipip_lb_configure(), were never updated to pass the parameter further. Do that now. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-05mlxsw: spectrum_router: Clarify a commentPetr Machata1-2/+2
"Reserved for X" usually means that only X is supposed to use a given object. Here, it is used in the sense that X should consider the object "reserved", as in "restricted". Replace the comment simply by "X", with the implication that that's where the field is used. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-05net: stmmac: dwmac-sogfpga: use the lynx pcs driverMaxime Chevallier8-316/+83
dwmac_socfpga re-implements support for the TSE PCS, which is identical to the already existing TSE PCS, which in turn is the same as the Lynx PCS. Drop the existing TSE re-implemenation and use the Lynx PCS instead, relying on the regmap-mdio driver to translate MDIO accesses into mmio accesses. Add a lynx_pcs reference in the stmmac's internal structure, and use .mac_select_pcs() to return the relevant PCS to be used. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-05net: ethernet: altera-tse: Convert to mdio-regmap and use PCS LynxMaxime Chevallier2-9/+50
The newly introduced regmap-based MDIO driver allows for an easy mapping of an mdiodevice onto the memory-mapped TSE PCS, which is actually a Lynx PCS. Convert Altera TSE to use this PCS instead of the pcs-altera-tse, which is nothing more than a memory-mapped Lynx PCS. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-04net: enetc: correct rx_bytes statistics of XDPWei Fang1-0/+8
The rx_bytes statistics of XDP are always zero, because rx_byte_cnt is not updated after it is initialized to 0. So fix it. Fixes: d1b15102dd16 ("net: enetc: add support for XDP_DROP and XDP_PASS") Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-04net: enetc: correct the statistics of rx bytesWei Fang1-1/+7
The rx_bytes of struct net_device_stats should count the length of ethernet frames excluding the FCS. However, there are two problems with the rx_bytes statistics of the current enetc driver. one is that the length of VLAN header is not counted if the VLAN extraction feature is enabled. The other is that the length of L2 header is not counted, because eth_type_trans() is invoked before updating rx_bytes which will subtract the length of L2 header from skb->len. BTW, the rx_bytes statistics of XDP path also have similar problem, I will fix it in another patch. Fixes: a800abd3ecb9 ("net: enetc: move skb creation into enetc_build_skb") Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-02net/mlx5: Devcom, extend mlx5_devcom_send_event to work with more than two ↵Shay Drory3-4/+19
devices mlx5_devcom_send_event is used to send event from one eswitch to the other. In other words, only one event is sent, which means, no error mechanism is needed. However, In case devcom have more than two eswitches, a proper error mechanism is needed. Hence, in case of error, devcom will perform the error unwind, since devcom knows how many events were successful. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-02net/mlx5: Devcom, introduce devcom_for_each_peer_entrySaeed Mahameed7-99/+209
Introduce generic APIs which will retrieve all peers. This API replace mlx5_devcom_get/release_peer_data which retrieve only a single peer. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-02net/mlx5: E-switch, mark devcom as not ready when all eswitches are unpairedShay Drory2-1/+8
Whenever an eswitch is unpaired with another, the driver mark devcom as not ready. While this is correct in case we are pairing only two eswitches, in order to support pairing of more than two eswitches, driver need to mark devcom as not ready only when all eswitches are unpaired. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-02net/mlx5: Devcom, Rename paired to readyShay Drory6-22/+22
In downstream patch devcom will provide support for more than two devices. The term 'paired' will be renamed as 'ready' to convey a more accurate meaning. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-02net/mlx5: DR, handle more than one peer domainShay Drory12-30/+42
Currently, DR domain is using the assumption that each domain can only have a single peer. In order to support VF LAG of more then two ports, expand peer domain to use an array of peers, and align the code accordingly. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-02net/mlx5: E-switch, generalize shared FDB creationShay Drory5-26/+66
Shared FDB creation is hard coded for only two eswitches. Generalize shared FDB creation so that any number of eswitches could create shared FDB. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-02net/mlx5: E-switch, Handle multiple master egress rulesShay Drory3-36/+79
Currently, whenever a shared FDB is created, the slave eswitch is creating master egress rule to the master eswitch. In order to support more than two ports, which means there will be more than one slave eswitch, enlarge bounce_rule, which is used to create master egress rule, to an xarray. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-02net/mlx5: E-switch, refactor FDB miss rule add/removeShay Drory2-5/+6
Currently, E-switch FDB have a single peer miss rule. In order to support more than one peer, refactor E-switch FDB to have peer miss rule per peer, and change the code to add/remove a rule from specific peer. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-02net/mlx5: E-switch, enlarge peer miss group tableShay Drory1-3/+4
There is an implicit assumption that peer miss group table require to handle only a single peer. Also, there is an assumption that total_vports of the master is greater or equal to the total_vports of each peer. Change the code to support peer miss group for more than one peer. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-02net/mlx5e: Handle offloads flows per peerShay Drory4-11/+34
Currently, E-switch offloads table have a list of all flows that create a peer_flow over the peer eswitch. In order to support more than one peer, extend E-switch offloads table peer_flow to hold an array of lists, where each peer have dedicate index via mlx5_get_dev_index(). Thereafter, extend original flow to hold an array of peers as well. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-02net/mlx5e: en_tc, re-factor query route portMark Bloch1-19/+13
query for peer esw outside of if scope. This is preparation for query route port over multiple peers. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-02net/mlx5e: rep, store send to vport rules per peerMark Bloch3-26/+98
Each representor, for each send queue, is holding a send_to_vport rule for the peer eswitch. In order to support more than one peer, and to map between the peer rules and peer eswitches, refactor representor to hold both the peer rules and pointer to the peer eswitches. This enables mlx5 to store send_to_vport rules per peer, where each peer have dedicate index via mlx5_get_dev_index(). Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-02net/mlx5e: tc, Refactor peer add/del flowShay Drory1-31/+34
Move peer_eswitch outside mlx5e_tc_add_fdb_peer_flow() so downstream patch can call mlx5e_tc_add_fdb_peer_flow() with multiple peers. Move peer_eswitch in the remove flow as well in order to keep symmetry. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-02net/mlx5e: en_tc, Extend peer flows to a listMark Bloch2-17/+28
Currently, mlx5e_flow is holding a pointer to a peer_flow, in case one was created. e.g. There is an assumption that mlx5e_flow can have only one peer. In order to support more than one peer, refactor mlx5e_flow to hold a list of peer flows. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-02net: lan743x: Remove extranous gotosMoritz Fischer1-15/+5
The gotos for cleanup aren't required, the function might as well just return the actual error code. Signed-off-by: Moritz Fischer <moritzf@google.com> Reviewed-by: Siddharth Vadapalli <s-vadapalli@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-02net: systemport: Replace platform_get_irq with platform_get_irq_optionalJiasheng Jiang1-2/+2
Replace platform_get_irq with platform_get_irq_optional because wol_irq is optional. Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-02r8169: use dev_err_probe in all appropriate places in rtl_init_one()Heiner Kallweit1-25/+15
In addition to properly handling probe deferrals dev_err_probe() conveniently combines printing an error message with returning the errno. So let's use it for every error path in rtl_init_one() to simplify the code. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/f0596a19-d517-e301-b649-304f9247b75a@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-02devlink: bring port new reply backJiri Pirko2-4/+8
In the offending fixes commit I mistakenly removed the reply message of the port new command. I was under impression it is a new port notification, partly due to the "notify" in the name of the helper function. Bring the code sending reply with new port message back, this time putting it directly to devlink_nl_cmd_port_new_doit() Fixes: c496daeb8630 ("devlink: remove duplicate port notification") Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230531142025.2605001-1-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski28-221/+290
Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: drivers/net/ethernet/sfc/tc.c 622ab656344a ("sfc: fix error unwinds in TC offload") b6583d5e9e94 ("sfc: support TC decap rules matching on enc_src_port") net/mptcp/protocol.c 5b825727d087 ("mptcp: add annotations around msk->subflow accesses") e76c8ef5cc5b ("mptcp: refactor mptcp_stream_accept()") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-01Merge tag 'mlx5-fixes-2023-05-31' of ↵Jakub Kicinski3-14/+12
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2023-05-31 This series provides bug fixes to mlx5 driver. * tag 'mlx5-fixes-2023-05-31' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: Read embedded cpu after init bit cleared net/mlx5e: Fix error handling in mlx5e_refresh_tirs net/mlx5: Ensure af_desc.mask is properly initialized net/mlx5: Fix setting of irq->map.index for static IRQ case net/mlx5: Remove rmap also in case dynamic MSIX not supported ==================== Link: https://lore.kernel.org/r/20230601031051.131529-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-01ice: recycle/free all of the fragments from multi-buffer frameMaciej Fijalkowski1-1/+1
The ice driver caches next_to_clean value at the beginning of ice_clean_rx_irq() in order to remember the first buffer that has to be freed/recycled after main Rx processing loop. The end boundary is indicated by first descriptor of frame that Rx processing loop has ended its duties. Note that if mentioned loop ended in the middle of gathering multi-buffer frame, next_to_clean would be pointing to the descriptor in the middle of the frame BUT freeing/recycling stage will stop at the first descriptor. This means that next iteration of ice_clean_rx_irq() will miss the (first_desc, next_to_clean - 1) entries. When running various 9K MTU workloads, such splats were observed: [ 540.780716] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 540.787787] #PF: supervisor read access in kernel mode [ 540.793002] #PF: error_code(0x0000) - not-present page [ 540.798218] PGD 0 P4D 0 [ 540.800801] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 540.805231] CPU: 18 PID: 3984 Comm: xskxceiver Tainted: G W 6.3.0-rc7+ #96 [ 540.813619] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019 [ 540.824209] RIP: 0010:ice_clean_rx_irq+0x2b6/0xf00 [ice] [ 540.829678] Code: 74 24 10 e9 aa 00 00 00 8b 55 78 41 31 57 10 41 09 c4 4d 85 ff 0f 84 83 00 00 00 49 8b 57 08 41 8b 4f 1c 65 8b 35 1a fa 4b 3f <48> 8b 02 48 c1 e8 3a 39 c6 0f 85 a2 00 00 00 f6 42 08 02 0f 85 98 [ 540.848717] RSP: 0018:ffffc9000f42fc50 EFLAGS: 00010282 [ 540.854029] RAX: 0000000000000004 RBX: 0000000000000002 RCX: 000000000000fffe [ 540.861272] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 00000000ffffffff [ 540.868519] RBP: ffff88984a05ac00 R08: 0000000000000000 R09: dead000000000100 [ 540.875760] R10: ffff88983fffcd00 R11: 000000000010f2b8 R12: 0000000000000004 [ 540.883008] R13: 0000000000000003 R14: 0000000000000800 R15: ffff889847a10040 [ 540.890253] FS: 00007f6ddf7fe640(0000) GS:ffff88afdf800000(0000) knlGS:0000000000000000 [ 540.898465] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 540.904299] CR2: 0000000000000000 CR3: 000000010d3da001 CR4: 00000000007706e0 [ 540.911542] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 540.918789] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 540.926032] PKRU: 55555554 [ 540.928790] Call Trace: [ 540.931276] <TASK> [ 540.933418] ice_napi_poll+0x4ca/0x6d0 [ice] [ 540.937804] ? __pfx_ice_napi_poll+0x10/0x10 [ice] [ 540.942716] napi_busy_loop+0xd7/0x320 [ 540.946537] xsk_recvmsg+0x143/0x170 [ 540.950178] sock_recvmsg+0x99/0xa0 [ 540.953729] __sys_recvfrom+0xa8/0x120 [ 540.957543] ? do_futex+0xbd/0x1d0 [ 540.961008] ? __x64_sys_futex+0x73/0x1d0 [ 540.965083] __x64_sys_recvfrom+0x20/0x30 [ 540.969155] do_syscall_64+0x38/0x90 [ 540.972796] entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 540.977934] RIP: 0033:0x7f6de5f27934 To fix this, set cached_ntc to first_desc so that at the end, when freeing/recycling buffers, descriptors from first to ntc are not missed. Fixes: 2fba7dc5157b ("ice: Add support for XDP multi-buffer on Rx side") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20230531154457.3216621-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-01net: renesas: rswitch: Fix return value in error path of xmitYoshihiro Shimoda1-1/+1
Fix return value in the error path of rswitch_start_xmit(). If TX queues are full, this function should return NETDEV_TX_BUSY. Fixes: 3590918b5d07 ("net: ethernet: renesas: Add support for "Ethernet Switch"") Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Link: https://lore.kernel.org/r/20230529073817.1145208-1-yoshihiro.shimoda.uh@renesas.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-01RDMA/mana_ib: Use v2 version of cfg_rx_steer_req to enable RX coalescingLong Li1-1/+4
With RX coalescing, one CQE entry can be used to indicate multiple packets on the receive queue. This saves processing time and PCI bandwidth over the CQ. The MANA Ethernet driver also uses the v2 version of the protocol. It doesn't use RX coalescing and its behavior is not changed. Link: https://lore.kernel.org/r/1684045095-31228-1-git-send-email-longli@linuxonhyperv.com Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-06-01chelsio: Convert chtls_sendpage() to use MSG_SPLICE_PAGESDavid Howells1-102/+7
Convert chtls_sendpage() to use sendmsg() with MSG_SPLICE_PAGES rather than directly splicing in the pages itself. This allows ->sendpage() to be replaced by something that can handle multiple multipage folios in a single transaction. Signed-off-by: David Howells <dhowells@redhat.com> cc: Ayush Sawal <ayush.sawal@chelsio.com> cc: "David S. Miller" <davem@davemloft.net> cc: Eric Dumazet <edumazet@google.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> cc: netdev@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>