summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorJakub Kicinski <kuba@kernel.org>2026-06-15 21:45:05 +0300
committerJakub Kicinski <kuba@kernel.org>2026-06-15 21:45:06 +0300
commit75983f837d20af89df60b4e8f08e5ca4e0a6cb72 (patch)
treea08904c346d0902cdec5304aa6ad90f2ee2f2a7c
parent2319688890d97c63da423a3c57c23b4ab5952dfc (diff)
parent7bcfb19465fca99efd09ecb5d3ef8f91179d7ff1 (diff)
downloadlinux-75983f837d20af89df60b4e8f08e5ca4e0a6cb72.tar.xz
Merge branch 'net-mlx5-add-switchdev-mode-support-for-socket-direct-single-netdev-part-2-2'
Tariq Toukan says: ==================== net/mlx5: Add switchdev mode support for Socket Direct single netdev, part 2/2 This is part 2. Find part 1 here: https://lore.kernel.org/all/20260531113954.395443-1-tariqt@nvidia.com/ This series enables Socket Direct single netdev to operate in switchdev mode with shared FDB. SD single netdev combines multiple PCI functions behind a single netdev interface. To support switchdev offloads, these functions must participate in virtual LAG (shared FDB). Design Rather than introducing a separate LAG instance for SD, this series integrates SD secondary devices into the existing LAG structure (priv.lag) created at probe time. Each lag_func entry carries a group_id field that identifies its SD group membership (0 means not part of any SD group). An xarray mark (XA_MARK_PORT) distinguishes physical port entries from SD secondaries, enabling a single unified iterator that filters by group: - MLX5_LAG_FILTER_PORTS: iterate port-level entries only (existing behavior, used by bonding, FW LAG commands, v2p_map) - MLX5_LAG_FILTER_ALL: iterate all devices including SD secondaries (used by MPESW shared FDB across all devices) - specific group_id: iterate only devices in that SD group (used by per-group SD shared FDB operations) Existing callers use mlx5_ldev_for_each() which maps to MLX5_LAG_FILTER_PORTS, preserving current behavior for non-SD configurations. Lifecycle and ownership The SD LAG lifecycle is tied to the SD group, not to bonding events: 1. At PCI probe, mlx5_lag_add_mdev() creates the LAG structure (priv.lag) for each LAG-capable PF. e.g.: SD primary devices 2. During mlx5_sd_init(), after the SD group is fully formed (primary and secondaries paired), sd_lag_init() registers the secondary devices into the primary's existing priv.lag by calling mlx5_ldev_add_mdev() with the SD group_id. The primary's lag_func also gets its group_id set. No separate LAG instance is created. 3. After all the devices in SD group transition to switchdev, mlx5_lag_shared_fdb_create() is invoked with the group_id to create a software-only shared FDB scoped to that SD group. This sets sd_fdb_active on all lag_func entries in the group. No FW LAG commands are issued since SD devices share the same physical port. 4. If MPESW (multi-port eswitch) is enabled on top of SD groups, the per-group SD shared FDB is torn down first, then MPESW shared FDB is created spanning all devices (ports + SD secondaries) using MLX5_LAG_FILTER_ALL. On MPESW disable, per-group SD shared FDB is restored. 5. On SD teardown (mlx5_sd_cleanup or device unbind), sd_lag_cleanup() removes secondaries from priv.lag and clears the primary's group_id. The LAG structure itself is not destroyed. The sd_fdb_active flag is set on all lag_func entries in a group (not just the primary), so any device can detect the SD shared FDB state during lag_disable_change teardown without needing to look up peer entries. SD shared FDB is a pure software construct -- unlike regular LAG modes (ROCE, SRIOV, MPESW), it does not issue FW create_lag/destroy_lag commands. The software vport LAG for SD is implemented via eswitch egress ACL bounce rules, managed by the IB layer through mlx5_eth_lag_init(). And the software LAG demux is implemented via steering rules that utilize new destination, VHCA_RX. Patches E-Switch preparation (patch 1): - Skip uplink IB rep load for SD secondary devices Devcom support (patches 2-3): - Expose locked variant of send_event - Add DEVCOM_CANT_FAIL for non-rollback events SD core hardening (patches 4-6): - Make primary/secondary role determination more robust - Add L2 table silent mode query support - Expand vport metadata for SD secondary devices SD switchdev transition (patches 7-8): - Support switchdev mode transition with shared FDB - Notify SD on eswitch disable LAG integration (patches 9-12): - Store demux resources per master lag_func - Disable both regular and SD LAG on lag_disable_change - Introduce software vport LAG implementation - Add MPESW over SD LAG support Deferred init (patches 13-14): - Tie rep load/unload to SD LAG state - Defer vport metadata init until SD is ready Enablement (patch 15): - Enable SD over ECPF and allow switchdev transition v2: https://lore.kernel.org/20260608135547.482825-1-tariqt@nvidia.com v1: https://lore.kernel.org/20260604114455.434711-1-tariqt@nvidia.com ==================== Link: https://patch.msgid.link/20260612113904.537595-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-rw-r--r--drivers/net/ethernet/mellanox/mlx5/core/eswitch.c1
-rw-r--r--drivers/net/ethernet/mellanox/mlx5/core/eswitch.h9
-rw-r--r--drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c296
-rw-r--r--drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c21
-rw-r--r--drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h2
-rw-r--r--drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c205
-rw-r--r--drivers/net/ethernet/mellanox/mlx5/core/lag/lag.h30
-rw-r--r--drivers/net/ethernet/mellanox/mlx5/core/lag/mpesw.c95
-rw-r--r--drivers/net/ethernet/mellanox/mlx5/core/lag/mpesw.h4
-rw-r--r--drivers/net/ethernet/mellanox/mlx5/core/lag/shared_fdb.c75
-rw-r--r--drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.c36
-rw-r--r--drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.h5
-rw-r--r--drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c410
-rw-r--r--drivers/net/ethernet/mellanox/mlx5/core/lib/sd.h8
14 files changed, 1075 insertions, 122 deletions
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index b531d1c226b0..a0e2ca87b8d8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -2065,6 +2065,7 @@ void mlx5_eswitch_disable(struct mlx5_eswitch *esw)
mlx5_esw_reps_unblock(esw);
esw->mode = MLX5_ESWITCH_LEGACY;
+ mlx5_sd_eswitch_mode_set(esw->dev, MLX5_ESWITCH_LEGACY);
mlx5_lag_enable_change(esw->dev);
}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index 94a530d19828..fea72b1dedab 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -440,6 +440,7 @@ struct mlx5_eswitch {
void esw_offloads_disable(struct mlx5_eswitch *esw);
int esw_offloads_enable(struct mlx5_eswitch *esw);
+int mlx5_esw_offloads_init_deferred_metadata(struct mlx5_eswitch *esw);
void esw_offloads_cleanup(struct mlx5_eswitch *esw);
int esw_offloads_init(struct mlx5_eswitch *esw);
@@ -950,11 +951,16 @@ void esw_vport_change_handle_locked(struct mlx5_vport *vport);
bool mlx5_esw_offloads_controller_valid(const struct mlx5_eswitch *esw, u32 controller);
+int mlx5_eswitch_offloads_vport_lag_add_one(struct mlx5_eswitch *master_esw,
+ struct mlx5_eswitch *slave_esw);
+void mlx5_eswitch_offloads_vport_lag_del_one(struct mlx5_eswitch *master_esw,
+ struct mlx5_eswitch *slave_esw);
int mlx5_eswitch_offloads_single_fdb_add_one(struct mlx5_eswitch *master_esw,
struct mlx5_eswitch *slave_esw, int max_slaves);
void mlx5_eswitch_offloads_single_fdb_del_one(struct mlx5_eswitch *master_esw,
struct mlx5_eswitch *slave_esw);
int mlx5_eswitch_reload_ib_reps(struct mlx5_eswitch *esw);
+void mlx5_eswitch_unload_reps(struct mlx5_eswitch *esw);
bool mlx5_eswitch_is_peer(struct mlx5_eswitch *esw,
struct mlx5_eswitch *peer_esw);
@@ -1059,6 +1065,9 @@ mlx5_eswitch_reload_ib_reps(struct mlx5_eswitch *esw)
return 0;
}
+static inline void
+mlx5_eswitch_unload_reps(struct mlx5_eswitch *esw) {}
+
static inline bool
mlx5_eswitch_block_encap(struct mlx5_core_dev *dev, bool from_fdb)
{
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 830fc910a080..907ee83a722d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -46,6 +46,7 @@
#include "fs_core.h"
#include "lib/mlx5.h"
#include "lib/devcom.h"
+#include "lib/sd.h"
#include "lib/eq.h"
#include "lib/fs_chains.h"
#include "en_tc.h"
@@ -2862,6 +2863,10 @@ static int mlx5_esw_offloads_rep_load(struct mlx5_eswitch *esw, u16 vport_num)
int rep_type;
int err;
+ if (vport_num != MLX5_VPORT_UPLINK &&
+ mlx5_get_sd(esw->dev) && !mlx5_lag_is_active(esw->dev))
+ return 0;
+
rep = mlx5_eswitch_get_rep(esw, vport_num);
for (rep_type = 0; rep_type < NUM_REP_TYPES; rep_type++) {
err = __esw_offloads_load_rep(esw, rep, rep_type,
@@ -3040,6 +3045,136 @@ static int __esw_set_master_egress_rule(struct mlx5_core_dev *master,
return err;
}
+static int esw_slave_egress_create_resources(struct mlx5_eswitch *esw,
+ struct mlx5_vport *vport)
+{
+ struct mlx5_flow_table_attr ft_attr = {
+ .max_fte = 1, .prio = 0, .level = 0,
+ };
+ int inlen = MLX5_ST_SZ_BYTES(create_flow_group_in);
+ struct mlx5_flow_namespace *ns;
+ struct mlx5_flow_table *acl;
+ struct mlx5_flow_group *g;
+ u32 *flow_group_in;
+ int err = 0;
+
+ if (vport->egress.acl)
+ return 0;
+
+ xa_init_flags(&vport->egress.offloads.bounce_rules, XA_FLAGS_ALLOC);
+ ns = mlx5_get_flow_vport_namespace(esw->dev,
+ MLX5_FLOW_NAMESPACE_ESW_EGRESS,
+ vport->index);
+ if (!ns)
+ return -EINVAL;
+
+ flow_group_in = kvzalloc(inlen, GFP_KERNEL);
+ if (!flow_group_in)
+ return -ENOMEM;
+
+ if (vport->vport || mlx5_core_is_ecpf(esw->dev))
+ ft_attr.flags = MLX5_FLOW_TABLE_OTHER_VPORT;
+
+ acl = mlx5_create_vport_flow_table(ns, &ft_attr, vport->vport);
+ if (IS_ERR(acl)) {
+ err = PTR_ERR(acl);
+ goto out;
+ }
+
+ g = mlx5_create_flow_group(acl, flow_group_in);
+ if (IS_ERR(g)) {
+ err = PTR_ERR(g);
+ goto err_table;
+ }
+
+ vport->egress.acl = acl;
+ vport->egress.offloads.bounce_grp = g;
+ vport->egress.type = VPORT_EGRESS_ACL_TYPE_SHARED_FDB;
+ err = 0;
+
+err_table:
+ if (err && !IS_ERR_OR_NULL(acl)) {
+ mlx5_destroy_flow_table(acl);
+ vport->egress.acl = NULL;
+ }
+out:
+ kvfree(flow_group_in);
+ return err;
+}
+
+static void esw_slave_egress_destroy_resources(struct mlx5_vport *vport)
+{
+ if (!IS_ERR_OR_NULL(vport->egress.offloads.bounce_grp)) {
+ mlx5_destroy_flow_group(vport->egress.offloads.bounce_grp);
+ vport->egress.offloads.bounce_grp = NULL;
+ }
+ if (!IS_ERR_OR_NULL(vport->egress.acl)) {
+ esw_acl_egress_ofld_cleanup(vport);
+ xa_destroy(&vport->egress.offloads.bounce_rules);
+ }
+}
+
+static int esw_set_slave_egress_rule(struct mlx5_core_dev *master,
+ struct mlx5_core_dev *slave)
+{
+ struct mlx5_eswitch *slave_esw = slave->priv.eswitch;
+ u16 master_vhca = MLX5_CAP_GEN(master, vhca_id);
+ struct mlx5_flow_destination dest = {};
+ struct mlx5_flow_handle *bounce_rule;
+ struct mlx5_flow_act flow_act = {};
+ struct mlx5_vport *slave_vport;
+ int err;
+
+ slave_vport = mlx5_eswitch_get_vport(slave_esw,
+ slave_esw->manager_vport);
+ if (IS_ERR(slave_vport))
+ return PTR_ERR(slave_vport);
+
+ err = esw_slave_egress_create_resources(slave_esw, slave_vport);
+ if (err)
+ return err;
+
+ flow_act.action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
+ dest.type = MLX5_FLOW_DESTINATION_TYPE_VPORT;
+ dest.vport.num = master->priv.eswitch->manager_vport;
+ dest.vport.vhca_id = master_vhca;
+ dest.vport.flags = MLX5_FLOW_DEST_VPORT_VHCA_ID;
+
+ bounce_rule = mlx5_add_flow_rules(slave_vport->egress.acl, NULL,
+ &flow_act, &dest, 1);
+ if (IS_ERR(bounce_rule)) {
+ err = PTR_ERR(bounce_rule);
+ goto err_rule;
+ }
+ err = xa_insert(&slave_vport->egress.offloads.bounce_rules,
+ master_vhca, bounce_rule, GFP_KERNEL);
+ if (err)
+ goto err_insert;
+
+ return 0;
+err_insert:
+ mlx5_del_flow_rules(bounce_rule);
+err_rule:
+ esw_slave_egress_destroy_resources(slave_vport);
+ return err;
+}
+
+static void esw_unset_slave_egress_rule(struct mlx5_core_dev *master,
+ struct mlx5_core_dev *slave)
+{
+ struct mlx5_eswitch *slave_esw = slave->priv.eswitch;
+ u16 master_vhca = MLX5_CAP_GEN(master, vhca_id);
+ struct mlx5_vport *slave_vport;
+
+ slave_vport = mlx5_eswitch_get_vport(slave_esw,
+ slave_esw->manager_vport);
+ if (IS_ERR(slave_vport))
+ return;
+
+ esw_acl_egress_ofld_bounce_rule_destroy(slave_vport, master_vhca);
+ esw_slave_egress_destroy_resources(slave_vport);
+}
+
static int esw_master_egress_create_resources(struct mlx5_eswitch *esw,
struct mlx5_flow_namespace *egress_ns,
struct mlx5_vport *vport, size_t count)
@@ -3164,6 +3299,9 @@ static void esw_unset_master_egress_rule(struct mlx5_core_dev *dev,
vport = mlx5_eswitch_get_vport(dev->priv.eswitch,
dev->priv.eswitch->manager_vport);
+ if (!vport->egress.acl)
+ return;
+
esw_acl_egress_ofld_bounce_rule_destroy(vport, MLX5_CAP_GEN(slave_dev, vhca_id));
if (xa_empty(&vport->egress.offloads.bounce_rules)) {
@@ -3182,6 +3320,9 @@ int mlx5_eswitch_offloads_single_fdb_add_one(struct mlx5_eswitch *master_esw,
if (err)
return err;
+ if (!mlx5_sd_is_primary(slave_esw->dev))
+ return 0;
+
err = esw_set_master_egress_rule(master_esw->dev,
slave_esw->dev, max_slaves);
if (err)
@@ -3201,6 +3342,18 @@ void mlx5_eswitch_offloads_single_fdb_del_one(struct mlx5_eswitch *master_esw,
esw_unset_master_egress_rule(master_esw->dev, slave_esw->dev);
}
+int mlx5_eswitch_offloads_vport_lag_add_one(struct mlx5_eswitch *master_esw,
+ struct mlx5_eswitch *slave_esw)
+{
+ return esw_set_slave_egress_rule(master_esw->dev, slave_esw->dev);
+}
+
+void mlx5_eswitch_offloads_vport_lag_del_one(struct mlx5_eswitch *master_esw,
+ struct mlx5_eswitch *slave_esw)
+{
+ esw_unset_slave_egress_rule(master_esw->dev, slave_esw->dev);
+}
+
#define ESW_OFFLOADS_DEVCOM_PAIR (0)
#define ESW_OFFLOADS_DEVCOM_UNPAIR (1)
@@ -3401,7 +3554,7 @@ void mlx5_esw_offloads_devcom_init(struct mlx5_eswitch *esw,
return;
if ((MLX5_VPORT_MANAGER(esw->dev) || mlx5_core_is_ecpf_esw_manager(esw->dev)) &&
- !mlx5_lag_is_supported(esw->dev))
+ (!mlx5_lag_is_supported(esw->dev) && !mlx5_get_sd(esw->dev)))
return;
xa_init(&esw->paired);
@@ -3472,12 +3625,12 @@ u32 mlx5_esw_match_metadata_alloc(struct mlx5_eswitch *esw)
u32 vport_end_ida = (1 << ESW_VPORT_BITS) - 1;
/* Reserve 0xf for internal port offload */
u32 max_pf_num = (1 << ESW_PFNUM_BITS) - 2;
- u32 pf_num;
+ int pf_num;
int id;
/* Only 4 bits of pf_num */
- pf_num = mlx5_get_dev_index(esw->dev);
- if (pf_num > max_pf_num)
+ pf_num = mlx5_sd_pf_num_get(esw->dev);
+ if (pf_num < 0 || pf_num > max_pf_num)
return 0;
/* Metadata is 4 bits of PFNUM and 12 bits of unique id */
@@ -3522,6 +3675,7 @@ static void esw_offloads_vport_metadata_cleanup(struct mlx5_eswitch *esw,
WARN_ON(vport->metadata != vport->default_metadata);
mlx5_esw_match_metadata_free(esw, vport->default_metadata);
+ vport->default_metadata = 0;
}
static void esw_offloads_metadata_uninit(struct mlx5_eswitch *esw)
@@ -3558,6 +3712,73 @@ metadata_err:
return err;
}
+/* Deferred metadata init for SD devices: allocate vport metadata and
+ * refresh the ingress ACL for every vport whose ACL was created with
+ * metadata=0 in esw_create_offloads_acl_tables() / esw_vport_setup().
+ *
+ * No Rep is loaded at this point ==> no Rep net-dev exists, so no need
+ * to take rtnl lock.
+ *
+ * Safe to call multiple times - subsequent calls are no-ops.
+ */
+int mlx5_esw_offloads_init_deferred_metadata(struct mlx5_eswitch *esw)
+{
+ struct mlx5_vport *manager, *vport;
+ unsigned long i;
+ int err;
+
+ if (!mlx5_eswitch_vport_match_metadata_enabled(esw))
+ return 0;
+
+ manager = mlx5_eswitch_get_vport(esw, esw->manager_vport);
+ if (IS_ERR(manager))
+ return PTR_ERR(manager);
+
+ /* Sanity check: skip if metadata was already initialized */
+ if (manager->default_metadata)
+ return 0;
+
+ err = esw_offloads_metadata_init(esw);
+ if (err)
+ return err;
+
+ mutex_lock(&esw->state_lock);
+ /* Manager vport doesn't have a rep/netdev loaded but its ingress ACL
+ * was programmed with metadata=0 - refresh it explicitly.
+ */
+ err = mlx5_esw_acl_ingress_vport_metadata_update(esw,
+ esw->manager_vport,
+ 0);
+ if (err)
+ goto err_acl;
+
+ /* UPLINK is never marked enabled but its ACL is programmed in
+ * esw_create_offloads_acl_tables(); refresh it explicitly.
+ */
+ err = mlx5_esw_acl_ingress_vport_metadata_update(esw, MLX5_VPORT_UPLINK,
+ 0);
+ if (err)
+ goto err_acl;
+
+ mlx5_esw_for_each_vport(esw, i, vport) {
+ if (!vport || !vport->enabled)
+ continue;
+ err = mlx5_esw_acl_ingress_vport_metadata_update(esw,
+ vport->vport,
+ 0);
+ if (err)
+ goto err_acl;
+ }
+
+ mutex_unlock(&esw->state_lock);
+ return 0;
+
+err_acl:
+ esw_offloads_metadata_uninit(esw);
+ mutex_unlock(&esw->state_lock);
+ return err;
+}
+
int
esw_vport_create_offloads_acl_tables(struct mlx5_eswitch *esw,
struct mlx5_vport *vport)
@@ -3630,6 +3851,21 @@ static void esw_destroy_offloads_acl_tables(struct mlx5_eswitch *esw)
esw_vport_destroy_offloads_acl_tables(esw, vport);
}
+void mlx5_eswitch_unload_reps(struct mlx5_eswitch *esw)
+{
+ struct mlx5_eswitch_rep *rep;
+ unsigned long i;
+
+ if (!esw || esw->mode != MLX5_ESWITCH_OFFLOADS)
+ return;
+
+ mlx5_esw_for_each_rep(esw, i, rep) {
+ if (rep->vport == MLX5_VPORT_UPLINK)
+ continue;
+ mlx5_esw_offloads_rep_unload(esw, rep->vport);
+ }
+}
+
int mlx5_eswitch_reload_ib_reps(struct mlx5_eswitch *esw)
{
struct mlx5_eswitch_rep *rep;
@@ -3643,11 +3879,23 @@ int mlx5_eswitch_reload_ib_reps(struct mlx5_eswitch *esw)
if (atomic_read(&rep->rep_data[REP_ETH].state) != REP_LOADED)
return 0;
- ret = __esw_offloads_load_rep(esw, rep, REP_IB, NULL);
- if (ret)
- return ret;
+ /* SD secondary devices share the primary's uplink and do not
+ * have their own uplink representor. Only load VF/SF vports.
+ */
+ if (mlx5_sd_is_primary(esw->dev)) {
+ ret = __esw_offloads_load_rep(esw, rep, REP_IB, NULL);
+ if (ret)
+ return ret;
+ }
mlx5_esw_for_each_rep(esw, i, rep) {
+ if (!mlx5_sd_is_primary(esw->dev) &&
+ rep->vport == MLX5_VPORT_UPLINK)
+ continue;
+ if (rep->vport != MLX5_VPORT_UPLINK &&
+ mlx5_get_sd(esw->dev) && !mlx5_lag_is_active(esw->dev))
+ continue;
+
if (atomic_read(&rep->rep_data[REP_ETH].state) == REP_LOADED)
__esw_offloads_load_rep(esw, rep, REP_IB, NULL);
}
@@ -3892,9 +4140,14 @@ int esw_offloads_enable(struct mlx5_eswitch *esw)
if (err)
goto err_roce;
- err = esw_offloads_metadata_init(esw);
- if (err)
- goto err_metadata;
+ /* SD devices defer metadata init until SD is ready and
+ * mlx5_sd_pf_num_get() can return the correct pf_num.
+ */
+ if (!mlx5_get_sd(esw->dev)) {
+ err = esw_offloads_metadata_init(esw);
+ if (err)
+ goto err_metadata;
+ }
err = esw_set_passing_vport_metadata(esw, true);
if (err)
@@ -4219,12 +4472,6 @@ int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode,
if (esw_mode_from_devlink(mode, &mlx5_mode))
return -EINVAL;
- if (mlx5_mode == MLX5_ESWITCH_OFFLOADS && mlx5_get_sd(esw->dev)) {
- NL_SET_ERR_MSG_MOD(extack,
- "Can't change E-Switch mode to switchdev when multi-PF netdev (Socket Direct) is configured.");
- return -EPERM;
- }
-
/* Avoid try_lock, active/inactive mode change is not restricted */
if (mlx5_devlink_switchdev_active_mode_change(esw, mode))
return 0;
@@ -4298,6 +4545,9 @@ unlock:
mlx5_esw_unlock(esw);
enable_lag:
mlx5_lag_enable_change(esw->dev);
+ /* Shared FDB activation is creating LAG which is changing reps. */
+ if (!err)
+ mlx5_sd_eswitch_mode_set(esw->dev, mlx5_mode);
return err;
}
@@ -4586,13 +4836,25 @@ mlx5_eswitch_register_vport_reps_blocked(struct mlx5_eswitch *esw,
static void mlx5_eswitch_reload_reps_blocked(struct mlx5_eswitch *esw)
{
+ struct mlx5_eswitch_rep *uplink;
struct mlx5_vport *vport;
+ bool newly_loaded;
unsigned long i;
if (esw->mode != MLX5_ESWITCH_OFFLOADS)
return;
- if (mlx5_esw_offloads_rep_load(esw, MLX5_VPORT_UPLINK))
+ uplink = mlx5_eswitch_get_rep(esw, MLX5_VPORT_UPLINK);
+ if (__esw_offloads_load_rep(esw, uplink, REP_ETH, &newly_loaded))
+ return;
+ if (mlx5_sd_is_primary(esw->dev) &&
+ __esw_offloads_load_rep(esw, uplink, REP_IB, NULL)) {
+ if (newly_loaded)
+ __esw_offloads_unload_rep(esw, uplink, REP_ETH);
+ return;
+ }
+
+ if (mlx5_get_sd(esw->dev) && !mlx5_lag_is_active(esw->dev))
return;
mlx5_esw_for_each_vport(esw, i, vport) {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
index 1cd4cd898ec2..8af73393770c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
@@ -1217,3 +1217,24 @@ int mlx5_fs_cmd_set_tx_flow_table_root(struct mlx5_core_dev *dev, u32 ft_id, boo
return mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
}
+
+int mlx5_fs_cmd_query_l2table_silent(struct mlx5_core_dev *dev, u8 *silent_mode)
+{
+ u32 out[MLX5_ST_SZ_DW(query_l2_table_entry_out)] = {};
+ u32 in[MLX5_ST_SZ_DW(query_l2_table_entry_in)] = {};
+ int err;
+
+ if (!MLX5_CAP_GEN(dev, silent_mode_query))
+ return -EOPNOTSUPP;
+
+ MLX5_SET(query_l2_table_entry_in, in, opcode,
+ MLX5_CMD_OP_QUERY_L2_TABLE_ENTRY);
+ MLX5_SET(query_l2_table_entry_in, in, silent_mode_query, 1);
+
+ err = mlx5_cmd_exec_inout(dev, query_l2_table_entry, in, out);
+ if (err)
+ return err;
+
+ *silent_mode = MLX5_GET(query_l2_table_entry_out, out, silent_mode);
+ return 0;
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h
index 7eb7b3ffe3d8..60280ff7da50 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h
@@ -124,6 +124,8 @@ const struct mlx5_flow_cmds *mlx5_fs_cmd_get_fw_cmds(void);
int mlx5_fs_cmd_set_l2table_entry_silent(struct mlx5_core_dev *dev, u8 silent_mode);
int mlx5_fs_cmd_set_tx_flow_table_root(struct mlx5_core_dev *dev, u32 ft_id, bool disconnect);
+int mlx5_fs_cmd_query_l2table_silent(struct mlx5_core_dev *dev,
+ u8 *silent_mode);
static inline bool mlx5_fs_cmd_is_fw_term_table(struct mlx5_flow_table *ft)
{
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
index dd3f18f85466..28d16fdc3f06 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
@@ -139,9 +139,44 @@ static int mlx5_cmd_modify_lag(struct mlx5_core_dev *dev, struct mlx5_lag *ldev,
return mlx5_cmd_exec_in(dev, modify_lag, in);
}
+static u32 mlx5_lag_dev_group_id(struct mlx5_core_dev *dev)
+{
+ struct mlx5_lag *ldev = mlx5_lag_dev(dev);
+ struct lag_func *pf;
+ int i;
+
+ if (!ldev)
+ return 0;
+
+ mlx5_lag_for_each(i, 0, ldev, MLX5_LAG_FILTER_ALL) {
+ pf = mlx5_lag_pf(ldev, i);
+ if (pf->dev == dev)
+ return pf->sd_fdb_active ? pf->group_id : 0;
+ }
+ return 0;
+}
+
+static int mlx5_lag_is_sw_lag(struct mlx5_core_dev *dev)
+{
+ return mlx5_lag_is_sd(dev);
+}
+
int mlx5_cmd_create_vport_lag(struct mlx5_core_dev *dev)
{
u32 in[MLX5_ST_SZ_DW(create_vport_lag_in)] = {};
+ struct mlx5_lag *ldev = mlx5_lag_dev(dev);
+ int ret;
+
+ if (mlx5_lag_is_sw_lag(dev)) {
+ if (!ldev)
+ return -ENODEV;
+
+ mutex_lock(&ldev->lock);
+ ret = mlx5_lag_create_vport_lag(mlx5_lag_dev(dev),
+ mlx5_lag_dev_group_id(dev));
+ mutex_unlock(&ldev->lock);
+ return ret;
+ }
MLX5_SET(create_vport_lag_in, in, opcode, MLX5_CMD_OP_CREATE_VPORT_LAG);
@@ -152,6 +187,18 @@ EXPORT_SYMBOL(mlx5_cmd_create_vport_lag);
int mlx5_cmd_destroy_vport_lag(struct mlx5_core_dev *dev)
{
u32 in[MLX5_ST_SZ_DW(destroy_vport_lag_in)] = {};
+ struct mlx5_lag *ldev = mlx5_lag_dev(dev);
+
+ if (mlx5_lag_is_sw_lag(dev)) {
+ if (!ldev)
+ return 0;
+
+ mutex_lock(&ldev->lock);
+ mlx5_lag_destroy_vport_lag(mlx5_lag_dev(dev),
+ mlx5_lag_dev_group_id(dev));
+ mutex_unlock(&ldev->lock);
+ return 0;
+ }
MLX5_SET(destroy_vport_lag_in, in, opcode, MLX5_CMD_OP_DESTROY_VPORT_LAG);
@@ -1265,6 +1312,32 @@ int mlx5_lag_reload_ib_reps_from_locked(struct mlx5_lag *ldev, u32 flags,
return mlx5_lag_reload_ib_reps(ldev, flags, filter, cont_on_fail);
}
+static void mlx5_lag_unload_reps_unlocked(struct mlx5_lag *ldev, u32 filter)
+{
+ struct lag_func *pf;
+ int i;
+
+ mlx5_lag_for_each(i, 0, ldev, filter) {
+ struct mlx5_eswitch *esw;
+
+ pf = mlx5_lag_pf(ldev, i);
+ esw = pf->dev->priv.eswitch;
+ mlx5_esw_reps_block(esw);
+ mlx5_eswitch_unload_reps(esw);
+ mlx5_esw_reps_unblock(esw);
+ }
+}
+
+void mlx5_lag_unload_reps_from_locked(struct mlx5_lag *ldev, u32 filter)
+{
+ /* Same lock dance as mlx5_lag_reload_ib_reps: drop ldev->lock around
+ * the per-eswitch reps_lock to keep the reps_lock -> ldev->lock order.
+ */
+ mlx5_lag_drop_lock_for_reps(ldev, filter);
+ mlx5_lag_unload_reps_unlocked(ldev, filter);
+ mlx5_lag_retake_lock_after_reps(ldev);
+}
+
void mlx5_disable_lag(struct mlx5_lag *ldev)
{
bool shared_fdb = test_bit(MLX5_LAG_MODE_FLAG_SHARED_FDB, &ldev->mode_flags);
@@ -1590,7 +1663,7 @@ struct mlx5_devcom_comp_dev *mlx5_lag_get_devcom_comp(struct mlx5_lag *ldev)
static int mlx5_lag_demux_ft_fg_init(struct mlx5_core_dev *dev,
struct mlx5_flow_table_attr *ft_attr,
- struct mlx5_lag *ldev)
+ struct lag_func *pf)
{
#ifdef CONFIG_MLX5_ESWITCH
struct mlx5_flow_namespace *ns;
@@ -1601,20 +1674,20 @@ static int mlx5_lag_demux_ft_fg_init(struct mlx5_core_dev *dev,
if (!ns)
return 0;
- ldev->lag_demux_ft = mlx5_create_flow_table(ns, ft_attr);
- if (IS_ERR(ldev->lag_demux_ft))
- return PTR_ERR(ldev->lag_demux_ft);
+ pf->lag_demux_ft = mlx5_create_flow_table(ns, ft_attr);
+ if (IS_ERR(pf->lag_demux_ft))
+ return PTR_ERR(pf->lag_demux_ft);
fg = mlx5_esw_lag_demux_fg_create(dev->priv.eswitch,
- ldev->lag_demux_ft);
+ pf->lag_demux_ft);
if (IS_ERR(fg)) {
err = PTR_ERR(fg);
- mlx5_destroy_flow_table(ldev->lag_demux_ft);
- ldev->lag_demux_ft = NULL;
+ mlx5_destroy_flow_table(pf->lag_demux_ft);
+ pf->lag_demux_ft = NULL;
return err;
}
- ldev->lag_demux_fg = fg;
+ pf->lag_demux_fg = fg;
return 0;
#else
return -EOPNOTSUPP;
@@ -1623,7 +1696,7 @@ static int mlx5_lag_demux_ft_fg_init(struct mlx5_core_dev *dev,
static int mlx5_lag_demux_fw_init(struct mlx5_core_dev *dev,
struct mlx5_flow_table_attr *ft_attr,
- struct mlx5_lag *ldev)
+ struct lag_func *pf)
{
struct mlx5_flow_namespace *ns;
int err;
@@ -1632,12 +1705,12 @@ static int mlx5_lag_demux_fw_init(struct mlx5_core_dev *dev,
if (!ns)
return 0;
- ldev->lag_demux_fg = NULL;
+ pf->lag_demux_fg = NULL;
ft_attr->max_fte = 1;
- ldev->lag_demux_ft = mlx5_create_lag_demux_flow_table(ns, ft_attr);
- if (IS_ERR(ldev->lag_demux_ft)) {
- err = PTR_ERR(ldev->lag_demux_ft);
- ldev->lag_demux_ft = NULL;
+ pf->lag_demux_ft = mlx5_create_lag_demux_flow_table(ns, ft_attr);
+ if (IS_ERR(pf->lag_demux_ft)) {
+ err = PTR_ERR(pf->lag_demux_ft);
+ pf->lag_demux_ft = NULL;
return err;
}
@@ -1648,6 +1721,7 @@ int mlx5_lag_demux_init(struct mlx5_core_dev *dev,
struct mlx5_flow_table_attr *ft_attr)
{
struct mlx5_lag *ldev;
+ struct lag_func *pf;
if (!ft_attr)
return -EINVAL;
@@ -1656,12 +1730,16 @@ int mlx5_lag_demux_init(struct mlx5_core_dev *dev,
if (!ldev)
return -ENODEV;
- xa_init(&ldev->lag_demux_rules);
+ pf = mlx5_lag_pf_by_dev(ldev, dev);
+ if (!pf)
+ return -ENODEV;
+
+ xa_init(&pf->lag_demux_rules);
- if (mlx5_get_sd(dev))
- return mlx5_lag_demux_ft_fg_init(dev, ft_attr, ldev);
+ if (mlx5_lag_is_sw_lag(dev))
+ return mlx5_lag_demux_ft_fg_init(dev, ft_attr, pf);
- return mlx5_lag_demux_fw_init(dev, ft_attr, ldev);
+ return mlx5_lag_demux_fw_init(dev, ft_attr, pf);
}
EXPORT_SYMBOL(mlx5_lag_demux_init);
@@ -1670,40 +1748,63 @@ void mlx5_lag_demux_cleanup(struct mlx5_core_dev *dev)
struct mlx5_flow_handle *rule;
struct mlx5_lag *ldev;
unsigned long vport_num;
+ struct lag_func *pf;
ldev = mlx5_lag_dev(dev);
if (!ldev)
return;
- xa_for_each(&ldev->lag_demux_rules, vport_num, rule)
+ pf = mlx5_lag_pf_by_dev(ldev, dev);
+ if (!pf)
+ return;
+
+ xa_for_each(&pf->lag_demux_rules, vport_num, rule)
mlx5_del_flow_rules(rule);
- xa_destroy(&ldev->lag_demux_rules);
+ xa_destroy(&pf->lag_demux_rules);
- if (ldev->lag_demux_fg)
- mlx5_destroy_flow_group(ldev->lag_demux_fg);
- if (ldev->lag_demux_ft)
- mlx5_destroy_flow_table(ldev->lag_demux_ft);
- ldev->lag_demux_fg = NULL;
- ldev->lag_demux_ft = NULL;
+ if (pf->lag_demux_fg)
+ mlx5_destroy_flow_group(pf->lag_demux_fg);
+ if (pf->lag_demux_ft)
+ mlx5_destroy_flow_table(pf->lag_demux_ft);
+ pf->lag_demux_fg = NULL;
+ pf->lag_demux_ft = NULL;
}
EXPORT_SYMBOL(mlx5_lag_demux_cleanup);
+static struct lag_func *mlx5_lag_dev_get_master_pf(struct mlx5_lag *ldev,
+ struct mlx5_core_dev *dev)
+{
+ u32 filter = mlx5_lag_get_filter(ldev, dev);
+ int idx;
+
+ idx = mlx5_lag_get_dev_index_by_seq_filter(ldev, MLX5_LAG_P1, filter);
+ if (idx < 0)
+ return NULL;
+
+ return mlx5_lag_pf(ldev, idx);
+}
+
int mlx5_lag_demux_rule_add(struct mlx5_core_dev *vport_dev, u16 vport_num,
int index)
{
struct mlx5_flow_handle *rule;
+ struct lag_func *master;
struct mlx5_lag *ldev;
int err;
ldev = mlx5_lag_dev(vport_dev);
- if (!ldev || !ldev->lag_demux_fg)
+ if (!ldev)
+ return 0;
+
+ master = mlx5_lag_dev_get_master_pf(ldev, vport_dev);
+ if (!master || !master->lag_demux_fg)
return 0;
- if (xa_load(&ldev->lag_demux_rules, index))
+ if (xa_load(&master->lag_demux_rules, index))
return 0;
rule = mlx5_esw_lag_demux_rule_create(vport_dev->priv.eswitch,
- vport_num, ldev->lag_demux_ft);
+ vport_num, master->lag_demux_ft);
if (IS_ERR(rule)) {
err = PTR_ERR(rule);
mlx5_core_warn(vport_dev,
@@ -1712,7 +1813,7 @@ int mlx5_lag_demux_rule_add(struct mlx5_core_dev *vport_dev, u16 vport_num,
return err;
}
- err = xa_err(xa_store(&ldev->lag_demux_rules, index, rule,
+ err = xa_err(xa_store(&master->lag_demux_rules, index, rule,
GFP_KERNEL));
if (err) {
mlx5_del_flow_rules(rule);
@@ -1728,13 +1829,18 @@ EXPORT_SYMBOL(mlx5_lag_demux_rule_add);
void mlx5_lag_demux_rule_del(struct mlx5_core_dev *dev, int index)
{
struct mlx5_flow_handle *rule;
+ struct lag_func *master_pf;
struct mlx5_lag *ldev;
ldev = mlx5_lag_dev(dev);
- if (!ldev || !ldev->lag_demux_fg)
+ if (!ldev)
+ return;
+
+ master_pf = mlx5_lag_dev_get_master_pf(ldev, dev);
+ if (!master_pf || !master_pf->lag_demux_fg)
return;
- rule = xa_erase(&ldev->lag_demux_rules, index);
+ rule = xa_erase(&master_pf->lag_demux_rules, index);
if (rule)
mlx5_del_flow_rules(rule);
}
@@ -2461,13 +2567,26 @@ EXPORT_SYMBOL(mlx5_lag_is_shared_fdb);
void mlx5_lag_disable_change(struct mlx5_core_dev *dev)
{
+ struct mlx5_devcom_comp_dev *sd_devcom = mlx5_sd_get_devcom(dev);
+ struct mlx5_core_dev *primary = dev;
struct mlx5_lag *ldev;
+ struct lag_func *pf;
+ bool mpesw;
+ int i;
ldev = mlx5_lag_dev(dev);
if (!ldev)
return;
- mlx5_devcom_comp_lock(dev->priv.hca_devcom_comp);
+ if (sd_devcom) {
+ mlx5_devcom_comp_lock(sd_devcom);
+ primary = mlx5_sd_get_primary(dev) ?: dev;
+ mlx5_devcom_comp_unlock(sd_devcom);
+ }
+ mlx5_devcom_comp_lock(primary->priv.hca_devcom_comp);
+ mpesw = ldev->mode == MLX5_LAG_MODE_MPESW;
+ if (mpesw)
+ mlx5_mpesw_sd_devcoms_lock(ldev);
mutex_lock(&ldev->lock);
ldev->mode_changes_in_progress++;
@@ -2479,7 +2598,25 @@ void mlx5_lag_disable_change(struct mlx5_core_dev *dev)
}
mutex_unlock(&ldev->lock);
- mlx5_devcom_comp_unlock(dev->priv.hca_devcom_comp);
+ if (mpesw)
+ mlx5_mpesw_sd_devcoms_unlock(ldev);
+ mlx5_devcom_comp_unlock(primary->priv.hca_devcom_comp);
+
+ if (!sd_devcom)
+ return;
+
+ /* Teardown SD shared FDB for this device's group if active */
+ mlx5_devcom_comp_lock(sd_devcom);
+ mutex_lock(&ldev->lock);
+ mlx5_lag_for_each(i, 0, ldev, MLX5_LAG_FILTER_ALL) {
+ pf = mlx5_lag_pf(ldev, i);
+ if (pf->dev == dev && pf->sd_fdb_active) {
+ mlx5_lag_shared_fdb_destroy(ldev, pf->group_id);
+ break;
+ }
+ }
+ mutex_unlock(&ldev->lock);
+ mlx5_devcom_comp_unlock(sd_devcom);
}
void mlx5_lag_enable_change(struct mlx5_core_dev *dev)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.h b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.h
index 0296f752bb4c..e9f0ef83ce1d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.h
@@ -59,6 +59,10 @@ struct lag_func {
struct mlx5_nb port_change_nb;
u32 group_id; /* SD group ID, 0 = not SD */
bool sd_fdb_active; /* set on all SD group members */
+ /* Lag demux resources - only populated on master devices */
+ struct mlx5_flow_table *lag_demux_ft;
+ struct mlx5_flow_group *lag_demux_fg;
+ struct xarray lag_demux_rules;
};
/* Used for collection of netdev event info. */
@@ -95,9 +99,6 @@ struct mlx5_lag {
/* Protect lag fields/state changes */
struct mutex lock;
struct lag_mpesw lag_mpesw;
- struct mlx5_flow_table *lag_demux_ft;
- struct mlx5_flow_group *lag_demux_fg;
- struct xarray lag_demux_rules;
};
static inline struct mlx5_lag *
@@ -157,6 +158,14 @@ __mlx5_lag_is_sd(struct mlx5_lag *ldev, struct mlx5_core_dev *dev)
}
static inline bool
+__mlx5_lag_dev_is_port(struct mlx5_lag *ldev, struct mlx5_core_dev *dev)
+{
+ struct lag_func *pf = mlx5_lag_pf_by_dev(ldev, dev);
+
+ return pf && xa_get_mark(&ldev->pfs, pf->idx, MLX5_LAG_XA_MARK_PORT);
+}
+
+static inline bool
__mlx5_lag_is_active(struct mlx5_lag *ldev)
{
return ldev->mode != MLX5_LAG_MODE_NONE;
@@ -174,6 +183,8 @@ int mlx5_lag_shared_fdb_create(struct mlx5_lag *ldev,
enum mlx5_lag_mode mode,
u32 group_id);
void mlx5_lag_shared_fdb_destroy(struct mlx5_lag *ldev, u32 group_id);
+int mlx5_lag_create_vport_lag(struct mlx5_lag *ldev, u32 group_id);
+int mlx5_lag_destroy_vport_lag(struct mlx5_lag *ldev, u32 group_id);
int mlx5_lag_create_single_fdb(struct mlx5_lag *ldev);
void mlx5_lag_destroy_single_fdb(struct mlx5_lag *ldev);
bool mlx5_lag_shared_fdb_supported(struct mlx5_lag *ldev);
@@ -190,6 +201,18 @@ static inline int mlx5_lag_shared_fdb_create(struct mlx5_lag *ldev,
static inline void mlx5_lag_shared_fdb_destroy(struct mlx5_lag *ldev,
u32 group_id) {}
+static inline int mlx5_lag_create_vport_lag(struct mlx5_lag *ldev,
+ u32 group_id)
+{
+ return -EOPNOTSUPP;
+}
+
+static inline int mlx5_lag_destroy_vport_lag(struct mlx5_lag *ldev,
+ u32 group_id)
+{
+ return -EOPNOTSUPP;
+}
+
static inline int mlx5_lag_create_single_fdb(struct mlx5_lag *ldev)
{
return -EOPNOTSUPP;
@@ -287,6 +310,7 @@ int mlx5_lag_num_devs(struct mlx5_lag *ldev);
int mlx5_lag_num_netdevs(struct mlx5_lag *ldev);
int mlx5_lag_reload_ib_reps_from_locked(struct mlx5_lag *ldev, u32 flags,
u32 filter, bool cont_on_fail);
+void mlx5_lag_unload_reps_from_locked(struct mlx5_lag *ldev, u32 filter);
int mlx5_ldev_add_mdev(struct mlx5_lag *ldev, struct mlx5_core_dev *dev,
u32 group_id);
void mlx5_ldev_remove_mdev(struct mlx5_lag *ldev, struct mlx5_core_dev *dev);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/mpesw.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/mpesw.c
index 2cb44084e239..50bfb450c71e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/mpesw.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/mpesw.c
@@ -15,7 +15,7 @@ static void mlx5_mpesw_metadata_cleanup(struct mlx5_lag *ldev)
u32 pf_metadata;
int i;
- mlx5_ldev_for_each(i, 0, ldev) {
+ mlx5_lag_for_each(i, 0, ldev, MLX5_LAG_FILTER_ALL) {
dev = mlx5_lag_pf(ldev, i)->dev;
esw = dev->priv.eswitch;
pf_metadata = ldev->lag_mpesw.pf_metadata[i];
@@ -36,7 +36,7 @@ static int mlx5_mpesw_metadata_set(struct mlx5_lag *ldev)
u32 pf_metadata;
int i, err;
- mlx5_ldev_for_each(i, 0, ldev) {
+ mlx5_lag_for_each(i, 0, ldev, MLX5_LAG_FILTER_ALL) {
dev = mlx5_lag_pf(ldev, i)->dev;
esw = dev->priv.eswitch;
pf_metadata = mlx5_esw_match_metadata_alloc(esw);
@@ -52,7 +52,7 @@ static int mlx5_mpesw_metadata_set(struct mlx5_lag *ldev)
goto err_metadata;
}
- mlx5_ldev_for_each(i, 0, ldev) {
+ mlx5_lag_for_each(i, 0, ldev, MLX5_LAG_FILTER_ALL) {
dev = mlx5_lag_pf(ldev, i)->dev;
mlx5_notifier_call_chain(dev->priv.events, MLX5_DEV_EVENT_MULTIPORT_ESW,
(void *)0);
@@ -65,6 +65,48 @@ err_metadata:
return err;
}
+static void mlx5_mpesw_restore_sd_fdb(struct mlx5_lag *ldev)
+{
+ struct lag_func *pf;
+ int err, i;
+
+ mlx5_ldev_for_each(i, 0, ldev) {
+ pf = mlx5_lag_pf(ldev, i);
+ err = mlx5_lag_shared_fdb_create(ldev, NULL, 0, pf->group_id);
+ if (err)
+ mlx5_core_warn(pf->dev,
+ "Failed to restore SD shared FDB (%d)\n",
+ err);
+ }
+}
+
+static int mlx5_mpesw_teardown_sd_fdb(struct mlx5_lag *ldev)
+{
+ struct lag_func *pf;
+ int i;
+
+ mlx5_ldev_for_each(i, 0, ldev) {
+ pf = mlx5_lag_pf(ldev, i);
+ if (!pf->sd_fdb_active)
+ continue;
+ mlx5_lag_shared_fdb_destroy(ldev, pf->group_id);
+ }
+ return 0;
+}
+
+static bool mlx5_lag_has_sd_group(struct mlx5_lag *ldev)
+{
+ struct lag_func *pf;
+ int i;
+
+ mlx5_ldev_for_each(i, 0, ldev) {
+ pf = mlx5_lag_pf(ldev, i);
+ if (pf->group_id)
+ return true;
+ }
+ return false;
+}
+
static int mlx5_lag_enable_mpesw(struct mlx5_lag *ldev)
{
int idx = mlx5_lag_get_dev_index_by_seq(ldev, MLX5_LAG_P1);
@@ -92,10 +134,17 @@ static int mlx5_lag_enable_mpesw(struct mlx5_lag *ldev)
if (err)
return err;
+ if (mlx5_lag_has_sd_group(ldev))
+ mlx5_mpesw_teardown_sd_fdb(ldev);
+
err = mlx5_lag_shared_fdb_create(ldev, NULL, MLX5_LAG_MODE_MPESW,
MLX5_LAG_FILTER_ALL);
if (err) {
- mlx5_core_warn(dev0, "Failed to create LAG in MPESW mode (%d)\n", err);
+ mlx5_core_warn(dev0,
+ "Failed to create LAG in MPESW mode (%d)\n",
+ err);
+ if (mlx5_lag_has_sd_group(ldev))
+ mlx5_mpesw_restore_sd_fdb(ldev);
mlx5_mpesw_metadata_cleanup(ldev);
return err;
}
@@ -105,9 +154,36 @@ static int mlx5_lag_enable_mpesw(struct mlx5_lag *ldev)
void mlx5_lag_disable_mpesw(struct mlx5_lag *ldev)
{
- if (ldev->mode == MLX5_LAG_MODE_MPESW) {
- mlx5_mpesw_metadata_cleanup(ldev);
- mlx5_lag_shared_fdb_destroy(ldev, MLX5_LAG_FILTER_ALL);
+ if (ldev->mode != MLX5_LAG_MODE_MPESW)
+ return;
+
+ mlx5_mpesw_metadata_cleanup(ldev);
+ mlx5_lag_shared_fdb_destroy(ldev, MLX5_LAG_FILTER_ALL);
+ if (mlx5_lag_has_sd_group(ldev))
+ mlx5_mpesw_restore_sd_fdb(ldev);
+}
+
+void mlx5_mpesw_sd_devcoms_lock(struct mlx5_lag *ldev)
+{
+ struct mlx5_devcom_comp_dev *sd_devcom;
+ int i;
+
+ mlx5_ldev_for_each(i, 0, ldev) {
+ sd_devcom = mlx5_sd_get_devcom(mlx5_lag_pf(ldev, i)->dev);
+ if (sd_devcom)
+ mlx5_devcom_comp_lock(sd_devcom);
+ }
+}
+
+void mlx5_mpesw_sd_devcoms_unlock(struct mlx5_lag *ldev)
+{
+ struct mlx5_devcom_comp_dev *sd_devcom;
+ int i;
+
+ mlx5_ldev_for_each_reverse(i, MLX5_MAX_PORTS, 0, ldev) {
+ sd_devcom = mlx5_sd_get_devcom(mlx5_lag_pf(ldev, i)->dev);
+ if (sd_devcom)
+ mlx5_devcom_comp_unlock(sd_devcom);
}
}
@@ -122,6 +198,7 @@ static void mlx5_mpesw_work(struct work_struct *work)
return;
mlx5_devcom_comp_lock(devcom);
+ mlx5_mpesw_sd_devcoms_lock(ldev);
mutex_lock(&ldev->lock);
if (ldev->mode_changes_in_progress) {
mpesww->result = -EAGAIN;
@@ -134,6 +211,7 @@ static void mlx5_mpesw_work(struct work_struct *work)
mlx5_lag_disable_mpesw(ldev);
unlock:
mutex_unlock(&ldev->lock);
+ mlx5_mpesw_sd_devcoms_unlock(ldev);
mlx5_devcom_comp_unlock(devcom);
complete(&mpesww->comp);
}
@@ -199,7 +277,8 @@ bool mlx5_lag_is_mpesw(struct mlx5_core_dev *dev)
{
struct mlx5_lag *ldev = mlx5_lag_dev(dev);
- return ldev && ldev->mode == MLX5_LAG_MODE_MPESW;
+ return ldev && ldev->mode == MLX5_LAG_MODE_MPESW &&
+ __mlx5_lag_dev_is_port(ldev, dev);
}
EXPORT_SYMBOL(mlx5_lag_is_mpesw);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/mpesw.h b/drivers/net/ethernet/mellanox/mlx5/core/lag/mpesw.h
index b767dbb4f457..5099723ba0f7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/mpesw.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/mpesw.h
@@ -33,8 +33,12 @@ void mlx5_lag_mpesw_disable(struct mlx5_core_dev *dev);
int mlx5_lag_mpesw_enable(struct mlx5_core_dev *dev);
#ifdef CONFIG_MLX5_ESWITCH
void mlx5_lag_disable_mpesw(struct mlx5_lag *ldev);
+void mlx5_mpesw_sd_devcoms_lock(struct mlx5_lag *ldev);
+void mlx5_mpesw_sd_devcoms_unlock(struct mlx5_lag *ldev);
#else
static inline void mlx5_lag_disable_mpesw(struct mlx5_lag *ldev) {}
+static inline void mlx5_mpesw_sd_devcoms_lock(struct mlx5_lag *ldev) {}
+static inline void mlx5_mpesw_sd_devcoms_unlock(struct mlx5_lag *ldev) {}
#endif /* CONFIG_MLX5_ESWITCH */
#ifdef CONFIG_MLX5_ESWITCH
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/shared_fdb.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/shared_fdb.c
index 1371e14c4c13..113866494d16 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/shared_fdb.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/shared_fdb.c
@@ -89,6 +89,76 @@ err:
return err;
}
+int mlx5_lag_create_vport_lag(struct mlx5_lag *ldev, u32 group_id)
+{
+ u32 filter = group_id ? group_id : MLX5_LAG_FILTER_ALL;
+ int master_idx = mlx5_lag_get_dev_index_by_seq_filter(ldev, MLX5_LAG_P1,
+ filter);
+ struct mlx5_eswitch *master_esw;
+ struct mlx5_core_dev *dev0;
+ int i, j;
+ int err;
+
+ if (master_idx < 0)
+ return -EINVAL;
+
+ dev0 = mlx5_lag_pf(ldev, master_idx)->dev;
+ master_esw = dev0->priv.eswitch;
+
+ mlx5_lag_for_each(i, 0, ldev, filter) {
+ struct mlx5_eswitch *slave_esw;
+
+ if (i == master_idx)
+ continue;
+
+ slave_esw = mlx5_lag_pf(ldev, i)->dev->priv.eswitch;
+ err = mlx5_eswitch_offloads_vport_lag_add_one(master_esw,
+ slave_esw);
+ if (err)
+ goto err;
+ }
+
+ return 0;
+
+err:
+ mlx5_lag_for_each_reverse(j, i - 1, 0, ldev, filter) {
+ struct mlx5_eswitch *slave_esw;
+
+ if (j == master_idx)
+ continue;
+ slave_esw = mlx5_lag_pf(ldev, j)->dev->priv.eswitch;
+ mlx5_eswitch_offloads_vport_lag_del_one(master_esw, slave_esw);
+ }
+ return err;
+}
+
+int mlx5_lag_destroy_vport_lag(struct mlx5_lag *ldev, u32 group_id)
+{
+ u32 filter = group_id ? group_id : MLX5_LAG_FILTER_ALL;
+ int master_idx = mlx5_lag_get_dev_index_by_seq_filter(ldev, MLX5_LAG_P1,
+ filter);
+ struct mlx5_eswitch *master_esw;
+ struct mlx5_core_dev *dev0;
+ int i;
+
+ if (master_idx < 0)
+ return 0;
+
+ dev0 = mlx5_lag_pf(ldev, master_idx)->dev;
+ master_esw = dev0->priv.eswitch;
+
+ mlx5_lag_for_each(i, 0, ldev, filter) {
+ struct mlx5_core_dev *dev;
+
+ if (i == master_idx)
+ continue;
+ dev = mlx5_lag_pf(ldev, i)->dev;
+ mlx5_eswitch_offloads_vport_lag_del_one(master_esw,
+ dev->priv.eswitch);
+ }
+ return 0;
+}
+
static void mlx5_lag_destroy_single_fdb_filter(struct mlx5_lag *ldev,
u32 filter)
{
@@ -141,7 +211,7 @@ int mlx5_lag_shared_fdb_create(struct mlx5_lag *ldev,
enum mlx5_lag_mode mode,
u32 group_id)
{
- u32 filter = group_id ? group_id : MLX5_LAG_FILTER_PORTS;
+ u32 filter = group_id ? group_id : MLX5_LAG_FILTER_ALL;
int idx = mlx5_lag_get_dev_index_by_seq_filter(ldev, MLX5_LAG_P1,
filter);
struct mlx5_core_dev *dev0;
@@ -209,7 +279,7 @@ err_add_devices:
void mlx5_lag_shared_fdb_destroy(struct mlx5_lag *ldev, u32 group_id)
{
- u32 filter = group_id ? group_id : MLX5_LAG_FILTER_PORTS;
+ u32 filter = group_id ? group_id : MLX5_LAG_FILTER_ALL;
struct lag_func *pf;
int err;
int i;
@@ -226,6 +296,7 @@ void mlx5_lag_shared_fdb_destroy(struct mlx5_lag *ldev, u32 group_id)
pf->sd_fdb_active = false;
}
mlx5_lag_destroy_single_fdb_filter(ldev, group_id);
+ mlx5_lag_unload_reps_from_locked(ldev, filter);
}
mlx5_lag_add_devices_filter(ldev, filter);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.c
index d40c53193ea8..64f92427602d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.c
@@ -287,9 +287,9 @@ int mlx5_devcom_comp_get_size(struct mlx5_devcom_comp_dev *devcom)
return kref_read(&comp->ref);
}
-int mlx5_devcom_send_event(struct mlx5_devcom_comp_dev *devcom,
- int event, int rollback_event,
- void *event_data)
+int mlx5_devcom_locked_send_event(struct mlx5_devcom_comp_dev *devcom,
+ int event, int rollback_event,
+ void *event_data)
{
struct mlx5_devcom_comp_dev *pos;
struct mlx5_devcom_comp *comp;
@@ -299,24 +299,28 @@ int mlx5_devcom_send_event(struct mlx5_devcom_comp_dev *devcom,
if (!devcom)
return -ENODEV;
+ lockdep_assert_held_write(&devcom->comp->sem);
comp = devcom->comp;
- down_write(&comp->sem);
list_for_each_entry(pos, &comp->comp_dev_list_head, list) {
data = rcu_dereference_protected(pos->data, lockdep_is_held(&comp->sem));
if (pos != devcom && data) {
err = comp->handler(event, data, event_data);
- if (err)
+ if (err && rollback_event != DEVCOM_CANT_FAIL) {
goto rollback;
+ } else if (err && rollback_event == DEVCOM_CANT_FAIL) {
+ WARN_ONCE(1, "devcom component %d event %d failed: %d\n",
+ comp->id, event, err);
+ return err;
+ }
}
}
- up_write(&comp->sem);
return 0;
rollback:
if (list_entry_is_head(pos, &comp->comp_dev_list_head, list))
- goto out;
+ return err;
pos = list_prev_entry(pos, list);
list_for_each_entry_from_reverse(pos, &comp->comp_dev_list_head, list) {
data = rcu_dereference_protected(pos->data, lockdep_is_held(&comp->sem));
@@ -324,7 +328,23 @@ rollback:
if (pos != devcom && data)
comp->handler(rollback_event, data, event_data);
}
-out:
+ return err;
+}
+
+int mlx5_devcom_send_event(struct mlx5_devcom_comp_dev *devcom,
+ int event, int rollback_event,
+ void *event_data)
+{
+ struct mlx5_devcom_comp *comp;
+ int err;
+
+ if (!devcom)
+ return -ENODEV;
+
+ comp = devcom->comp;
+ down_write(&comp->sem);
+ err = mlx5_devcom_locked_send_event(devcom, event, rollback_event,
+ event_data);
up_write(&comp->sem);
return err;
}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.h
index 316052a85ca5..7a704fafdbd3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.h
@@ -46,6 +46,11 @@ mlx5_devcom_register_component(struct mlx5_devcom_dev *devc,
void *data);
void mlx5_devcom_unregister_component(struct mlx5_devcom_comp_dev *devcom);
+#define DEVCOM_CANT_FAIL (INT_MAX)
+
+int mlx5_devcom_locked_send_event(struct mlx5_devcom_comp_dev *devcom,
+ int event, int rollback_event,
+ void *event_data);
int mlx5_devcom_send_event(struct mlx5_devcom_comp_dev *devcom,
int event, int rollback_event,
void *event_data);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
index 25286ecd724e..ee2fdefa1945 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
@@ -5,6 +5,8 @@
#include "../lag/lag.h"
#include "mlx5_core.h"
#include "lib/mlx5.h"
+#include "devlink.h"
+#include "eswitch.h"
#include "fs_cmd.h"
#include <linux/mlx5/eswitch.h>
#include <linux/mlx5/vport.h>
@@ -22,14 +24,19 @@ struct mlx5_sd {
struct dentry *dfs;
u8 state;
bool primary;
+ bool fw_silents_secondaries;
union {
struct { /* primary */
struct mlx5_core_dev *secondaries[MLX5_SD_MAX_GROUP_SZ - 1];
struct mlx5_flow_table *tx_ft;
+ /* Next index for secondary registration */
+ u8 next_secondary_idx;
};
struct { /* secondary */
struct mlx5_core_dev *primary_dev;
u32 alias_obj_id;
+ /* TX flow table root in switchdev (silent) config */
+ bool tx_root_silent;
};
};
};
@@ -82,6 +89,27 @@ bool mlx5_sd_is_primary(struct mlx5_core_dev *dev)
return sd->primary;
}
+int mlx5_sd_pf_num_get(struct mlx5_core_dev *dev)
+{
+ struct mlx5_sd *sd = mlx5_get_sd(dev);
+ int pf_num = mlx5_get_dev_index(dev);
+ struct mlx5_core_dev *pos;
+ int i;
+
+ if (!sd)
+ return pf_num;
+
+ mlx5_devcom_comp_assert_locked(sd->devcom);
+ if (!mlx5_devcom_comp_is_ready(sd->devcom))
+ return -ENODEV;
+
+ mlx5_sd_for_each_dev(i, mlx5_sd_get_primary(dev), pos)
+ if (pos == dev)
+ break;
+
+ return pf_num * sd->host_buses + i;
+}
+
struct mlx5_core_dev *
mlx5_sd_primary_get_peer(struct mlx5_core_dev *primary, int idx)
{
@@ -165,7 +193,8 @@ static bool mlx5_sd_caps_supported(struct mlx5_core_dev *dev, u8 host_buses)
/* Disconnect secondaries from the network */
if (!MLX5_CAP_GEN(dev, eswitch_manager))
return false;
- if (!MLX5_CAP_GEN(dev, silent_mode_set))
+ if (!MLX5_CAP_GEN(dev, silent_mode_set) &&
+ !MLX5_CAP_GEN(dev, silent_mode_query))
return false;
/* RX steering from primary to secondaries */
@@ -193,10 +222,6 @@ bool mlx5_sd_is_supported(struct mlx5_core_dev *dev)
if (!mlx5_core_is_pf(dev))
return false;
- /* Block on embedded CPU PFs */
- if (mlx5_core_is_ecpf(dev))
- return false;
-
err = mlx5_query_nic_vport_sd_group(dev, &sd_group);
if (err || !sd_group)
return false;
@@ -223,10 +248,6 @@ static int sd_init(struct mlx5_core_dev *dev)
if (!mlx5_core_is_pf(dev))
return 0;
- /* Block on embedded CPU PFs */
- if (mlx5_core_is_ecpf(dev))
- return 0;
-
err = mlx5_query_nic_vport_sd_group(dev, &sd_group);
if (err)
return err;
@@ -374,62 +395,196 @@ static void sd_lag_cleanup(struct mlx5_core_dev *dev)
mutex_unlock(&ldev->lock);
}
+enum {
+ SD_PRIMARY_SET,
+ SD_SECONDARIES_SET,
+ SD_FW_SILENT_CHECK,
+};
+
+static int sd_handle_fw_silent_check(struct mlx5_core_dev *dev,
+ struct mlx5_core_dev *peer)
+{
+ struct mlx5_sd *peer_sd = mlx5_get_sd(peer);
+ struct mlx5_sd *sd = mlx5_get_sd(dev);
+ u8 dev_silent = 0, peer_silent = 0;
+ int err;
+
+ if (peer_sd->fw_silents_secondaries) {
+ sd->fw_silents_secondaries = true;
+ return 0;
+ }
+
+ err = mlx5_fs_cmd_query_l2table_silent(dev, &dev_silent);
+ if (err) {
+ sd_warn(dev, "Failed to query silent mode for dev: %d\n", err);
+ return err;
+ }
+
+ err = mlx5_fs_cmd_query_l2table_silent(peer, &peer_silent);
+ if (err) {
+ sd_warn(dev, "Failed to query silent mode for peer: %d\n", err);
+ return err;
+ }
+
+ if (dev_silent || peer_silent) {
+ sd->fw_silents_secondaries = true;
+ peer_sd->fw_silents_secondaries = true;
+ sd_info(dev, "FW indicates at least one device is silent\n");
+ }
+ return 0;
+}
+
+static int sd_handle_primary_set(struct mlx5_core_dev *dev,
+ struct mlx5_core_dev *peer)
+{
+ struct mlx5_sd *peer_sd = mlx5_get_sd(peer);
+ struct mlx5_sd *sd = mlx5_get_sd(dev);
+ struct mlx5_core_dev *candidate;
+ struct mlx5_sd *candidate_sd;
+ bool dev_should_be_primary;
+
+ /* Peer is the device that being sent to all the other devices in the
+ * group. Hence, use peer to get the candidate device.
+ */
+ candidate = peer_sd->primary ? peer : peer_sd->primary_dev;
+
+ if (sd->fw_silents_secondaries) {
+ u8 candidate_silent = 0;
+ int err;
+
+ err = mlx5_fs_cmd_query_l2table_silent(candidate,
+ &candidate_silent);
+ if (err) {
+ sd_warn(candidate, "Failed to query silent mode for dev: %d\n",
+ err);
+ return err;
+ }
+ /* Candidate is silent, dev should be primary */
+ dev_should_be_primary = candidate_silent;
+ } else {
+ /* No FW silent mode, use bus number */
+ dev_should_be_primary =
+ dev->pdev->bus->number < candidate->pdev->bus->number;
+ }
+
+ if (!dev_should_be_primary)
+ return 0;
+
+ candidate_sd = mlx5_get_sd(candidate);
+
+ sd->primary = true;
+ candidate_sd->primary = false;
+ candidate_sd->primary_dev = dev;
+ peer_sd->primary = false;
+ peer_sd->primary_dev = dev;
+ return 0;
+}
+
+static void sd_handle_secondaries_set(struct mlx5_core_dev *dev,
+ struct mlx5_core_dev *peer)
+{
+ struct mlx5_sd *peer_sd = mlx5_get_sd(peer);
+ struct mlx5_sd *sd = mlx5_get_sd(dev);
+ u8 idx;
+
+ /* Primary has nothing to register with itself. */
+ if (sd->primary)
+ return;
+
+ /* dev is a secondary device, peer is the primary device.
+ * Secondary registers itself with the primary.
+ */
+ idx = peer_sd->next_secondary_idx++;
+ peer_sd->secondaries[idx] = dev;
+ sd->primary_dev = peer;
+}
+
+static int mlx5_sd_devcom_event(int event, void *my_data, void *event_data)
+{
+ struct mlx5_core_dev *peer = event_data;
+ struct mlx5_core_dev *dev = my_data;
+
+ switch (event) {
+ case SD_FW_SILENT_CHECK:
+ return sd_handle_fw_silent_check(dev, peer);
+ case SD_PRIMARY_SET:
+ return sd_handle_primary_set(dev, peer);
+ case SD_SECONDARIES_SET:
+ sd_handle_secondaries_set(dev, peer);
+ return 0;
+ }
+
+ return 0;
+}
+
static int sd_register(struct mlx5_core_dev *dev)
{
- struct mlx5_devcom_comp_dev *devcom, *pos;
struct mlx5_devcom_match_attr attr = {};
- struct mlx5_core_dev *peer, *primary;
- struct mlx5_sd *sd, *primary_sd;
- int err, i;
+ struct mlx5_devcom_comp_dev *devcom;
+ struct mlx5_core_dev *primary;
+ struct mlx5_sd *primary_sd;
+ struct mlx5_sd *sd;
+ int err;
sd = mlx5_get_sd(dev);
attr.key.val = sd->group_id;
attr.flags = MLX5_DEVCOM_MATCH_FLAGS_NS;
attr.net = mlx5_core_net(dev);
- devcom = mlx5_devcom_register_component(dev->priv.devc, MLX5_DEVCOM_SD_GROUP,
- &attr, NULL, dev);
+ devcom = mlx5_devcom_register_component(dev->priv.devc,
+ MLX5_DEVCOM_SD_GROUP,
+ &attr, mlx5_sd_devcom_event,
+ dev);
if (!devcom)
return -EINVAL;
sd->devcom = devcom;
- if (mlx5_devcom_comp_get_size(devcom) != sd->host_buses)
- return 0;
-
mlx5_devcom_comp_lock(devcom);
- mlx5_devcom_comp_set_ready(devcom, true);
- mlx5_devcom_comp_unlock(devcom);
+ if (mlx5_devcom_comp_get_size(devcom) != sd->host_buses ||
+ mlx5_devcom_comp_is_ready(devcom))
+ goto out;
- if (!mlx5_devcom_for_each_peer_begin(devcom)) {
- err = -ENODEV;
- goto err_devcom_unreg;
+ /* If silent mode query is supported, ask each device whether it is
+ * silent and propagate the result to the whole group. In each group
+ * only one device is not silent
+ */
+ if (MLX5_CAP_GEN(dev, silent_mode_query)) {
+ err = mlx5_devcom_locked_send_event(devcom, SD_FW_SILENT_CHECK,
+ SD_FW_SILENT_CHECK, dev);
+ if (err)
+ goto err_devcom_unreg;
}
- primary = dev;
- mlx5_devcom_for_each_peer_entry(devcom, peer, pos)
- if (peer->pdev->bus->number < primary->pdev->bus->number)
- primary = peer;
+ /* Send SD_PRIMARY_SET event with this device.
+ * All peers will receive this event and compare to this device.
+ * If fw_silents_secondaries is set, choose non-silent device.
+ * Otherwise use bus number.
+ */
+ sd->primary = true;
+ err = mlx5_devcom_locked_send_event(devcom, SD_PRIMARY_SET,
+ SD_PRIMARY_SET, dev);
+ if (err)
+ goto err_devcom_unreg;
- primary_sd = mlx5_get_sd(primary);
- primary_sd->primary = true;
- i = 0;
- /* loop the secondaries */
- mlx5_devcom_for_each_peer_entry(primary_sd->devcom, peer, pos) {
- struct mlx5_sd *peer_sd = mlx5_get_sd(peer);
-
- primary_sd->secondaries[i++] = peer;
- peer_sd->primary = false;
- peer_sd->primary_dev = primary;
- }
+ /* Broadcast SD_SECONDARIES_SET. Each non-sender peer's handler runs;
+ * the primary's handler returns early so only secondaries register.
+ */
+ primary = sd->primary ? dev : sd->primary_dev;
+ if (!sd->primary)
+ sd_handle_secondaries_set(dev, primary);
+ mlx5_devcom_locked_send_event(devcom, SD_SECONDARIES_SET,
+ DEVCOM_CANT_FAIL, primary);
- mlx5_devcom_for_each_peer_end(devcom);
+ primary_sd = mlx5_get_sd(primary);
+ if (primary_sd->next_secondary_idx + 1 == sd->host_buses)
+ mlx5_devcom_comp_set_ready(devcom, true);
+out:
+ mlx5_devcom_comp_unlock(devcom);
return 0;
err_devcom_unreg:
- mlx5_devcom_comp_lock(sd->devcom);
- mlx5_devcom_comp_set_ready(sd->devcom, false);
- mlx5_devcom_comp_unlock(sd->devcom);
- mlx5_devcom_unregister_component(sd->devcom);
+ mlx5_devcom_comp_unlock(devcom);
+ mlx5_devcom_unregister_component(devcom);
return err;
}
@@ -513,6 +668,29 @@ static void sd_secondary_destroy_alias_ft(struct mlx5_core_dev *secondary)
MLX5_GENERAL_OBJECT_TYPES_FLOW_TABLE_ALIAS);
}
+static int mlx5_sd_secondary_conf_tx_root(struct mlx5_core_dev *secondary,
+ bool disconnect)
+{
+ struct mlx5_sd *sd = mlx5_get_sd(secondary);
+ int err;
+
+ /* Idempotent: skip if TX root is already in the requested state. */
+ if (sd->tx_root_silent == disconnect)
+ return 0;
+
+ if (disconnect)
+ err = mlx5_fs_cmd_set_tx_flow_table_root(secondary, 0, true);
+ else
+ err = mlx5_fs_cmd_set_tx_flow_table_root(secondary,
+ sd->alias_obj_id,
+ false);
+ if (err)
+ return err;
+
+ sd->tx_root_silent = disconnect;
+ return 0;
+}
+
static int sd_cmd_set_secondary(struct mlx5_core_dev *secondary,
struct mlx5_core_dev *primary,
u8 *alias_key)
@@ -521,33 +699,42 @@ static int sd_cmd_set_secondary(struct mlx5_core_dev *secondary,
struct mlx5_sd *sd = mlx5_get_sd(secondary);
int err;
- err = mlx5_fs_cmd_set_l2table_entry_silent(secondary, 1);
- if (err)
- return err;
+ if (!primary_sd->fw_silents_secondaries) {
+ err = mlx5_fs_cmd_set_l2table_entry_silent(secondary, 1);
+ if (err)
+ return err;
+ }
err = sd_secondary_create_alias_ft(secondary, primary, primary_sd->tx_ft,
&sd->alias_obj_id, alias_key);
if (err)
goto err_unset_silent;
- err = mlx5_fs_cmd_set_tx_flow_table_root(secondary, sd->alias_obj_id, false);
+ err = mlx5_fs_cmd_set_tx_flow_table_root(secondary, sd->alias_obj_id,
+ false);
if (err)
goto err_destroy_alias_ft;
+ sd->tx_root_silent = false;
return 0;
err_destroy_alias_ft:
sd_secondary_destroy_alias_ft(secondary);
err_unset_silent:
- mlx5_fs_cmd_set_l2table_entry_silent(secondary, 0);
+ if (!primary_sd->fw_silents_secondaries)
+ mlx5_fs_cmd_set_l2table_entry_silent(secondary, 0);
return err;
}
static void sd_cmd_unset_secondary(struct mlx5_core_dev *secondary)
{
- mlx5_fs_cmd_set_tx_flow_table_root(secondary, 0, true);
+ struct mlx5_sd *primary_sd;
+
+ primary_sd = mlx5_get_sd(mlx5_sd_get_primary(secondary));
+ mlx5_sd_secondary_conf_tx_root(secondary, true);
sd_secondary_destroy_alias_ft(secondary);
- mlx5_fs_cmd_set_l2table_entry_silent(secondary, 0);
+ if (!primary_sd->fw_silents_secondaries)
+ mlx5_fs_cmd_set_l2table_entry_silent(secondary, 0);
}
static void sd_print_group(struct mlx5_core_dev *primary)
@@ -672,6 +859,7 @@ err_sd_unregister:
peer_sd->primary_dev = NULL;
}
primary_sd->primary = false;
+ primary_sd->next_secondary_idx = 0;
mlx5_devcom_comp_set_ready(sd->devcom, false);
mlx5_devcom_comp_unlock(sd->devcom);
sd_unregister(dev);
@@ -719,6 +907,7 @@ out_clear_peers:
peer_sd->primary_dev = NULL;
}
primary_sd->primary = false;
+ primary_sd->next_secondary_idx = 0;
out_ready_false:
mlx5_devcom_comp_set_ready(sd->devcom, false);
out_unlock:
@@ -771,6 +960,127 @@ struct auxiliary_device *mlx5_sd_get_adev(struct mlx5_core_dev *dev,
return &primary_adev->adev;
}
+#ifdef CONFIG_MLX5_ESWITCH
+/* All SD members must have completed esw_offloads_enable (i.e., reached
+ * mlx5_esw_offloads_devcom_init) and become eswitch-peers of the primary.
+ * Until then, mlx5_eswitch_is_peer() returns false for the not-yet-paired
+ * member and shared_fdb_supported_filter would reject. When all PFs transition
+ * in parallel, only the last one to finish satisfies this gate; the earlier
+ * ones return 0 silently here.
+ */
+static bool mlx5_sd_all_paired(struct mlx5_core_dev *primary)
+{
+ struct mlx5_eswitch *primary_esw = primary->priv.eswitch;
+ struct mlx5_core_dev *pos;
+ int i;
+
+ mlx5_sd_for_each_secondary(i, primary, pos) {
+ if (!mlx5_eswitch_is_peer(primary_esw, pos->priv.eswitch))
+ return false;
+ }
+ return true;
+}
+
+static void mlx5_sd_activate_shared_fdb(struct mlx5_core_dev *primary)
+{
+ struct mlx5_sd *sd = mlx5_get_sd(primary);
+ struct mlx5_core_dev *pos;
+ struct mlx5_lag *ldev;
+ struct lag_func *pf;
+ int err;
+ int i;
+
+ ldev = mlx5_lag_dev(primary);
+ if (!ldev) {
+ sd_warn(primary, "Shared FDB MUST have ldev\n");
+ return;
+ }
+
+ mutex_lock(&ldev->lock);
+
+ if (ldev->mode_changes_in_progress)
+ goto unlock;
+
+ if (!mlx5_sd_all_paired(primary))
+ goto unlock;
+
+ /* Check if SD FDB is already active for this group */
+ mlx5_lag_for_each(i, 0, ldev, sd->group_id) {
+ pf = mlx5_lag_pf(ldev, i);
+ if (pf->sd_fdb_active)
+ goto unlock;
+ break;
+ }
+
+ if (!mlx5_lag_shared_fdb_supported_filter(ldev, sd->group_id)) {
+ sd_warn(primary, "Shared FDB not supported\n");
+ goto unlock;
+ }
+
+ /* Initialize vport metadata for all group devices. This is deferred
+ * from esw_offloads_enable() because mlx5_sd_pf_num_get() requires
+ * the SD group to be ready.
+ */
+ mlx5_sd_for_each_dev(i, primary, pos) {
+ struct mlx5_eswitch *esw = pos->priv.eswitch;
+
+ err = mlx5_esw_offloads_init_deferred_metadata(esw);
+ if (err) {
+ sd_warn(primary, "Failed to init metadata for %s: %d\n",
+ dev_name(pos->device), err);
+ goto unlock;
+ }
+ }
+
+ err = mlx5_lag_shared_fdb_create(ldev, NULL, 0, sd->group_id);
+ if (err)
+ sd_warn(primary, "Failed to create shared FDB: %d\n", err);
+ else
+ sd_info(primary, "Shared FDB created\n");
+
+unlock:
+ mutex_unlock(&ldev->lock);
+}
+
+void mlx5_sd_eswitch_mode_set(struct mlx5_core_dev *dev, u16 mlx5_mode)
+{
+ struct mlx5_core_dev *primary;
+ struct mlx5_sd *sd;
+ int err;
+
+ sd = mlx5_get_sd(dev);
+ if (!sd || !mlx5_devcom_comp_is_ready(sd->devcom))
+ return;
+
+ mlx5_devcom_comp_lock(sd->devcom);
+ if (!mlx5_devcom_comp_is_ready(sd->devcom))
+ goto unlock;
+
+ primary = mlx5_sd_get_primary(dev);
+
+ /* Secondary devices need TX root reconfiguration */
+ if (dev != primary) {
+ bool disconnect = (mlx5_mode == MLX5_ESWITCH_OFFLOADS);
+
+ err = mlx5_sd_secondary_conf_tx_root(dev, disconnect);
+ if (err) {
+ sd_warn(dev, "Failed to set TX root: %d\n", err);
+ goto unlock;
+ }
+ }
+
+ /* Try to activate shared FDB when all devices are in switchdev.
+ * Shared FDB is optional - failure here doesn't fail the transition.
+ */
+ if (mlx5_mode == MLX5_ESWITCH_OFFLOADS)
+ mlx5_sd_activate_shared_fdb(primary);
+
+unlock:
+ mlx5_devcom_comp_unlock(sd->devcom);
+}
+
+#endif /* CONFIG_MLX5_ESWITCH */
+
void mlx5_sd_put_adev(struct auxiliary_device *actual_adev,
struct auxiliary_device *adev)
{
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.h
index 011702ff6f02..cb88bf34079a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.h
@@ -12,6 +12,7 @@ struct mlx5_sd;
struct mlx5_core_dev *mlx5_sd_get_primary(struct mlx5_core_dev *dev);
bool mlx5_sd_is_primary(struct mlx5_core_dev *dev);
+int mlx5_sd_pf_num_get(struct mlx5_core_dev *dev);
struct mlx5_core_dev *mlx5_sd_primary_get_peer(struct mlx5_core_dev *primary, int idx);
int mlx5_sd_ch_ix_get_dev_ix(struct mlx5_core_dev *dev, int ch_ix);
int mlx5_sd_ch_ix_get_vec_ix(struct mlx5_core_dev *dev, int ch_ix);
@@ -44,6 +45,13 @@ mlx5_sd_get_devcom(struct mlx5_core_dev *dev)
}
#endif
+#ifdef CONFIG_MLX5_ESWITCH
+void mlx5_sd_eswitch_mode_set(struct mlx5_core_dev *dev, u16 mlx5_mode);
+#else
+static inline void
+mlx5_sd_eswitch_mode_set(struct mlx5_core_dev *dev, u16 mlx5_mode) { return; }
+#endif
+
#define mlx5_sd_for_each_dev_from_to(i, primary, ix_from, to, pos) \
for (i = ix_from; \
(pos = mlx5_sd_primary_get_peer(primary, i)) && pos != (to); i++)