From 8ca07176ab00a6d06a9b254dcbb2514b4d607e9c Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Mon, 19 Jul 2021 16:51:39 +0300 Subject: net: switchdev: introduce a fanout helper for SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE Currently DSA has an issue with FDB entries pointing towards the bridge in the presence of br_fdb_replay() being called at port join and leave time. In particular, each bridge port will ask for a replay for the FDB entries pointing towards the bridge when it joins, and for another replay when it leaves. This means that for example, a bridge with 4 switch ports will notify DSA 4 times of the bridge MAC address. But if the MAC address of the bridge changes during the normal runtime of the system, the bridge notifies switchdev [ once ] of the deletion of the old MAC address as a local FDB towards the bridge, and of the insertion [ again once ] of the new MAC address as a local FDB. This is a problem, because DSA keeps the old MAC address as a host FDB entry with refcount 4 (4 ports asked for it using br_fdb_replay). So the old MAC address will not be deleted. Additionally, the new MAC address will only be installed with refcount 1, and when the first switch port leaves the bridge (leaving 3 others as still members), it will delete with it the new MAC address of the bridge from the local FDB entries kept by DSA (because the br_fdb_replay call on deletion will bring the entry's refcount from 1 to 0). So the problem, really, is that the number of br_fdb_replay() calls is not matched with the refcount that a host FDB is offloaded to DSA during normal runtime. An elegant way to solve the problem would be to make the switchdev notification emitted by br_fdb_change_mac_address() result in a host FDB kept by DSA which has a refcount exactly equal to the number of ports under that bridge. Then, no matter how many DSA ports join or leave that bridge, the host FDB entry will always be deleted when there are exactly zero remaining DSA switch ports members of the bridge. To implement the proposed solution, we remember that the switchdev objects and port attributes have some helpers provided by switchdev, which can be optionally called by drivers: switchdev_handle_port_obj_{add,del} and switchdev_handle_port_attr_set. These helpers: - fan out a switchdev object/attribute emitted for the bridge towards all the lower interfaces that pass the check_cb(). - fan out a switchdev object/attribute emitted for a bridge port that is a LAG towards all the lower interfaces that pass the check_cb(). In other words, this is the model we need for the FDB events too: something that will keep an FDB entry emitted towards a physical port as it is, but translate an FDB entry emitted towards the bridge into N FDB entries, one per physical port. Of course, there are many differences between fanning out a switchdev object (VLAN) on 3 lower interfaces of a LAG and fanning out an FDB entry on 3 lower interfaces of a LAG. Intuitively, an FDB entry towards a LAG should be treated specially, because FDB entries are unicast, we can't just install the same address towards 3 destinations. It is imaginable that drivers might want to treat this case specifically, so create some methods for this case and do not recurse into the LAG lower ports, just the bridge ports. DSA also listens for FDB entries on "foreign" interfaces, aka interfaces bridged with us which are not part of our hardware domain: think an Ethernet switch bridged with a Wi-Fi AP. For those addresses, DSA installs host FDB entries. However, there we have the same problem (those host FDB entries are installed with a refcount of only 1) and an even bigger one which we did not have with FDB entries towards the bridge: br_fdb_replay() is currently not called for FDB entries on foreign interfaces, just for the physical port and for the bridge itself. So when DSA sniffs an address learned by the software bridge towards a foreign interface like an e1000 port, and then that e1000 leaves the bridge, DSA remains with the dangling host FDB address. That will be fixed separately by replaying all FDB entries and not just the ones towards the port and the bridge. Signed-off-by: Vladimir Oltean Signed-off-by: David S. Miller --- net/switchdev/switchdev.c | 190 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 190 insertions(+) (limited to 'net/switchdev/switchdev.c') diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c index 070698dd19bc..82dd4e4e86f5 100644 --- a/net/switchdev/switchdev.c +++ b/net/switchdev/switchdev.c @@ -378,6 +378,196 @@ int call_switchdev_blocking_notifiers(unsigned long val, struct net_device *dev, } EXPORT_SYMBOL_GPL(call_switchdev_blocking_notifiers); +static int __switchdev_handle_fdb_add_to_device(struct net_device *dev, + const struct net_device *orig_dev, + const struct switchdev_notifier_fdb_info *fdb_info, + bool (*check_cb)(const struct net_device *dev), + bool (*foreign_dev_check_cb)(const struct net_device *dev, + const struct net_device *foreign_dev), + int (*add_cb)(struct net_device *dev, + const struct net_device *orig_dev, const void *ctx, + const struct switchdev_notifier_fdb_info *fdb_info), + int (*lag_add_cb)(struct net_device *dev, + const struct net_device *orig_dev, const void *ctx, + const struct switchdev_notifier_fdb_info *fdb_info)) +{ + const struct switchdev_notifier_info *info = &fdb_info->info; + struct net_device *lower_dev; + struct list_head *iter; + int err = -EOPNOTSUPP; + + if (check_cb(dev)) { + /* Handle FDB entries on foreign interfaces as FDB entries + * towards the software bridge. + */ + if (foreign_dev_check_cb && foreign_dev_check_cb(dev, orig_dev)) { + struct net_device *br = netdev_master_upper_dev_get_rcu(dev); + + if (!br || !netif_is_bridge_master(br)) + return 0; + + /* No point in handling FDB entries on a foreign bridge */ + if (foreign_dev_check_cb(dev, br)) + return 0; + + return __switchdev_handle_fdb_add_to_device(br, orig_dev, + fdb_info, check_cb, + foreign_dev_check_cb, + add_cb, lag_add_cb); + } + + return add_cb(dev, orig_dev, info->ctx, fdb_info); + } + + /* If we passed over the foreign check, it means that the LAG interface + * is offloaded. + */ + if (netif_is_lag_master(dev)) { + if (!lag_add_cb) + return -EOPNOTSUPP; + + return lag_add_cb(dev, orig_dev, info->ctx, fdb_info); + } + + /* Recurse through lower interfaces in case the FDB entry is pointing + * towards a bridge device. + */ + netdev_for_each_lower_dev(dev, lower_dev, iter) { + /* Do not propagate FDB entries across bridges */ + if (netif_is_bridge_master(lower_dev)) + continue; + + err = __switchdev_handle_fdb_add_to_device(lower_dev, orig_dev, + fdb_info, check_cb, + foreign_dev_check_cb, + add_cb, lag_add_cb); + if (err && err != -EOPNOTSUPP) + return err; + } + + return err; +} + +int switchdev_handle_fdb_add_to_device(struct net_device *dev, + const struct switchdev_notifier_fdb_info *fdb_info, + bool (*check_cb)(const struct net_device *dev), + bool (*foreign_dev_check_cb)(const struct net_device *dev, + const struct net_device *foreign_dev), + int (*add_cb)(struct net_device *dev, + const struct net_device *orig_dev, const void *ctx, + const struct switchdev_notifier_fdb_info *fdb_info), + int (*lag_add_cb)(struct net_device *dev, + const struct net_device *orig_dev, const void *ctx, + const struct switchdev_notifier_fdb_info *fdb_info)) +{ + int err; + + err = __switchdev_handle_fdb_add_to_device(dev, dev, fdb_info, + check_cb, + foreign_dev_check_cb, + add_cb, lag_add_cb); + if (err == -EOPNOTSUPP) + err = 0; + + return err; +} +EXPORT_SYMBOL_GPL(switchdev_handle_fdb_add_to_device); + +static int __switchdev_handle_fdb_del_to_device(struct net_device *dev, + const struct net_device *orig_dev, + const struct switchdev_notifier_fdb_info *fdb_info, + bool (*check_cb)(const struct net_device *dev), + bool (*foreign_dev_check_cb)(const struct net_device *dev, + const struct net_device *foreign_dev), + int (*del_cb)(struct net_device *dev, + const struct net_device *orig_dev, const void *ctx, + const struct switchdev_notifier_fdb_info *fdb_info), + int (*lag_del_cb)(struct net_device *dev, + const struct net_device *orig_dev, const void *ctx, + const struct switchdev_notifier_fdb_info *fdb_info)) +{ + const struct switchdev_notifier_info *info = &fdb_info->info; + struct net_device *lower_dev; + struct list_head *iter; + int err = -EOPNOTSUPP; + + if (check_cb(dev)) { + /* Handle FDB entries on foreign interfaces as FDB entries + * towards the software bridge. + */ + if (foreign_dev_check_cb && foreign_dev_check_cb(dev, orig_dev)) { + struct net_device *br = netdev_master_upper_dev_get_rcu(dev); + + if (!br || !netif_is_bridge_master(br)) + return 0; + + /* No point in handling FDB entries on a foreign bridge */ + if (foreign_dev_check_cb(dev, br)) + return 0; + + return __switchdev_handle_fdb_del_to_device(br, orig_dev, + fdb_info, check_cb, + foreign_dev_check_cb, + del_cb, lag_del_cb); + } + + return del_cb(dev, orig_dev, info->ctx, fdb_info); + } + + /* If we passed over the foreign check, it means that the LAG interface + * is offloaded. + */ + if (netif_is_lag_master(dev)) { + if (!lag_del_cb) + return -EOPNOTSUPP; + + return lag_del_cb(dev, orig_dev, info->ctx, fdb_info); + } + + /* Recurse through lower interfaces in case the FDB entry is pointing + * towards a bridge device. + */ + netdev_for_each_lower_dev(dev, lower_dev, iter) { + /* Do not propagate FDB entries across bridges */ + if (netif_is_bridge_master(lower_dev)) + continue; + + err = switchdev_handle_fdb_del_to_device(lower_dev, fdb_info, + check_cb, + foreign_dev_check_cb, + del_cb, lag_del_cb); + if (err && err != -EOPNOTSUPP) + return err; + } + + return err; +} + +int switchdev_handle_fdb_del_to_device(struct net_device *dev, + const struct switchdev_notifier_fdb_info *fdb_info, + bool (*check_cb)(const struct net_device *dev), + bool (*foreign_dev_check_cb)(const struct net_device *dev, + const struct net_device *foreign_dev), + int (*del_cb)(struct net_device *dev, + const struct net_device *orig_dev, const void *ctx, + const struct switchdev_notifier_fdb_info *fdb_info), + int (*lag_del_cb)(struct net_device *dev, + const struct net_device *orig_dev, const void *ctx, + const struct switchdev_notifier_fdb_info *fdb_info)) +{ + int err; + + err = __switchdev_handle_fdb_del_to_device(dev, dev, fdb_info, + check_cb, + foreign_dev_check_cb, + del_cb, lag_del_cb); + if (err == -EOPNOTSUPP) + err = 0; + + return err; +} +EXPORT_SYMBOL_GPL(switchdev_handle_fdb_del_to_device); + static int __switchdev_handle_port_obj_add(struct net_device *dev, struct switchdev_notifier_port_obj_info *port_obj_info, bool (*check_cb)(const struct net_device *dev), -- cgit v1.2.3 From 71f4f89a0324459f81656f3f9b20c1c0becaf647 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Tue, 20 Jul 2021 20:35:57 +0300 Subject: net: switchdev: recurse into __switchdev_handle_fdb_del_to_device The difference between __switchdev_handle_fdb_del_to_device and switchdev_handle_del_to_device is that the former takes an extra orig_dev argument, while the latter starts with dev == orig_dev. We should recurse into the variant that does not lose the orig_dev along the way. This is relevant when deleting FDB entries pointing towards a bridge (dev changes to the lower interfaces, but orig_dev shouldn't). The addition helper already recurses properly, just the deletion one doesn't. Fixes: 8ca07176ab00 ("net: switchdev: introduce a fanout helper for SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE") Signed-off-by: Vladimir Oltean Signed-off-by: David S. Miller --- net/switchdev/switchdev.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'net/switchdev/switchdev.c') diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c index 82dd4e4e86f5..42e88d3d66a7 100644 --- a/net/switchdev/switchdev.c +++ b/net/switchdev/switchdev.c @@ -532,10 +532,10 @@ static int __switchdev_handle_fdb_del_to_device(struct net_device *dev, if (netif_is_bridge_master(lower_dev)) continue; - err = switchdev_handle_fdb_del_to_device(lower_dev, fdb_info, - check_cb, - foreign_dev_check_cb, - del_cb, lag_del_cb); + err = __switchdev_handle_fdb_del_to_device(lower_dev, orig_dev, + fdb_info, check_cb, + foreign_dev_check_cb, + del_cb, lag_del_cb); if (err && err != -EOPNOTSUPP) return err; } -- cgit v1.2.3 From 2b0a5688493ab0f54774a2f89d3d91ab238f7ab4 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Thu, 22 Jul 2021 02:05:55 +0300 Subject: net: switchdev: fix FDB entries towards foreign ports not getting propagated to us The newly introduced switchdev_handle_fdb_{add,del}_to_device helpers solved a problem but introduced another one. They have a severe design bug: they do not propagate FDB events on foreign interfaces to us, i.e. this use case: br0 / \ / \ / \ / \ swp0 eno0 (switchdev) (foreign) when an address is learned on eno0, what is supposed to happen is that this event should also be propagated towards swp0. Somehow I managed to convince myself that this did work correctly, but obviously it does not. The trouble with foreign interfaces is that we must reach a switchdev net_device pointer through a foreign net_device that has no direct upper/lower relationship with it. So we need to do exploratory searching through the lower interfaces of the foreign net_device's bridge upper (to reach swp0 from eno0, we must check its upper, br0, for lower interfaces that pass the check_cb and foreign_dev_check_cb). This is something that the previous code did not do, it just assumed that "dev" will become a switchdev interface at some point, somehow, probably by magic. With this patch, assisted address learning on the CPU port works again in DSA: ip link add br0 type bridge ip link set swp0 master br0 ip link set eno0 master br0 ip link set br0 up [ 46.708929] mscc_felix 0000:00:00.5 swp0: Adding FDB entry towards eno0, addr 00:04:9f:05:f4:ab vid 0 as host address Fixes: 8ca07176ab00 ("net: switchdev: introduce a fanout helper for SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE") Reported-by: Eric Woudstra Signed-off-by: Vladimir Oltean Signed-off-by: David S. Miller --- net/switchdev/switchdev.c | 214 ++++++++++++++++++++++++++++++---------------- 1 file changed, 142 insertions(+), 72 deletions(-) (limited to 'net/switchdev/switchdev.c') diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c index 42e88d3d66a7..0ae3478561f4 100644 --- a/net/switchdev/switchdev.c +++ b/net/switchdev/switchdev.c @@ -378,6 +378,56 @@ int call_switchdev_blocking_notifiers(unsigned long val, struct net_device *dev, } EXPORT_SYMBOL_GPL(call_switchdev_blocking_notifiers); +struct switchdev_nested_priv { + bool (*check_cb)(const struct net_device *dev); + bool (*foreign_dev_check_cb)(const struct net_device *dev, + const struct net_device *foreign_dev); + const struct net_device *dev; + struct net_device *lower_dev; +}; + +static int switchdev_lower_dev_walk(struct net_device *lower_dev, + struct netdev_nested_priv *priv) +{ + struct switchdev_nested_priv *switchdev_priv = priv->data; + bool (*foreign_dev_check_cb)(const struct net_device *dev, + const struct net_device *foreign_dev); + bool (*check_cb)(const struct net_device *dev); + const struct net_device *dev; + + check_cb = switchdev_priv->check_cb; + foreign_dev_check_cb = switchdev_priv->foreign_dev_check_cb; + dev = switchdev_priv->dev; + + if (check_cb(lower_dev) && !foreign_dev_check_cb(lower_dev, dev)) { + switchdev_priv->lower_dev = lower_dev; + return 1; + } + + return 0; +} + +static struct net_device * +switchdev_lower_dev_find(struct net_device *dev, + bool (*check_cb)(const struct net_device *dev), + bool (*foreign_dev_check_cb)(const struct net_device *dev, + const struct net_device *foreign_dev)) +{ + struct switchdev_nested_priv switchdev_priv = { + .check_cb = check_cb, + .foreign_dev_check_cb = foreign_dev_check_cb, + .dev = dev, + .lower_dev = NULL, + }; + struct netdev_nested_priv priv = { + .data = &switchdev_priv, + }; + + netdev_walk_all_lower_dev_rcu(dev, switchdev_lower_dev_walk, &priv); + + return switchdev_priv.lower_dev; +} + static int __switchdev_handle_fdb_add_to_device(struct net_device *dev, const struct net_device *orig_dev, const struct switchdev_notifier_fdb_info *fdb_info, @@ -392,37 +442,18 @@ static int __switchdev_handle_fdb_add_to_device(struct net_device *dev, const struct switchdev_notifier_fdb_info *fdb_info)) { const struct switchdev_notifier_info *info = &fdb_info->info; - struct net_device *lower_dev; + struct net_device *br, *lower_dev; struct list_head *iter; int err = -EOPNOTSUPP; - if (check_cb(dev)) { - /* Handle FDB entries on foreign interfaces as FDB entries - * towards the software bridge. - */ - if (foreign_dev_check_cb && foreign_dev_check_cb(dev, orig_dev)) { - struct net_device *br = netdev_master_upper_dev_get_rcu(dev); - - if (!br || !netif_is_bridge_master(br)) - return 0; - - /* No point in handling FDB entries on a foreign bridge */ - if (foreign_dev_check_cb(dev, br)) - return 0; - - return __switchdev_handle_fdb_add_to_device(br, orig_dev, - fdb_info, check_cb, - foreign_dev_check_cb, - add_cb, lag_add_cb); - } - + if (check_cb(dev)) return add_cb(dev, orig_dev, info->ctx, fdb_info); - } - /* If we passed over the foreign check, it means that the LAG interface - * is offloaded. - */ if (netif_is_lag_master(dev)) { + if (!switchdev_lower_dev_find(dev, check_cb, foreign_dev_check_cb)) + goto maybe_bridged_with_us; + + /* This is a LAG interface that we offload */ if (!lag_add_cb) return -EOPNOTSUPP; @@ -432,20 +463,49 @@ static int __switchdev_handle_fdb_add_to_device(struct net_device *dev, /* Recurse through lower interfaces in case the FDB entry is pointing * towards a bridge device. */ - netdev_for_each_lower_dev(dev, lower_dev, iter) { - /* Do not propagate FDB entries across bridges */ - if (netif_is_bridge_master(lower_dev)) - continue; + if (netif_is_bridge_master(dev)) { + if (!switchdev_lower_dev_find(dev, check_cb, foreign_dev_check_cb)) + return 0; + + /* This is a bridge interface that we offload */ + netdev_for_each_lower_dev(dev, lower_dev, iter) { + /* Do not propagate FDB entries across bridges */ + if (netif_is_bridge_master(lower_dev)) + continue; + + /* Bridge ports might be either us, or LAG interfaces + * that we offload. + */ + if (!check_cb(lower_dev) && + !switchdev_lower_dev_find(lower_dev, check_cb, + foreign_dev_check_cb)) + continue; + + err = __switchdev_handle_fdb_add_to_device(lower_dev, orig_dev, + fdb_info, check_cb, + foreign_dev_check_cb, + add_cb, lag_add_cb); + if (err && err != -EOPNOTSUPP) + return err; + } - err = __switchdev_handle_fdb_add_to_device(lower_dev, orig_dev, - fdb_info, check_cb, - foreign_dev_check_cb, - add_cb, lag_add_cb); - if (err && err != -EOPNOTSUPP) - return err; + return 0; } - return err; +maybe_bridged_with_us: + /* Event is neither on a bridge nor a LAG. Check whether it is on an + * interface that is in a bridge with us. + */ + br = netdev_master_upper_dev_get_rcu(dev); + if (!br || !netif_is_bridge_master(br)) + return 0; + + if (!switchdev_lower_dev_find(br, check_cb, foreign_dev_check_cb)) + return 0; + + return __switchdev_handle_fdb_add_to_device(br, orig_dev, fdb_info, + check_cb, foreign_dev_check_cb, + add_cb, lag_add_cb); } int switchdev_handle_fdb_add_to_device(struct net_device *dev, @@ -487,37 +547,18 @@ static int __switchdev_handle_fdb_del_to_device(struct net_device *dev, const struct switchdev_notifier_fdb_info *fdb_info)) { const struct switchdev_notifier_info *info = &fdb_info->info; - struct net_device *lower_dev; + struct net_device *br, *lower_dev; struct list_head *iter; int err = -EOPNOTSUPP; - if (check_cb(dev)) { - /* Handle FDB entries on foreign interfaces as FDB entries - * towards the software bridge. - */ - if (foreign_dev_check_cb && foreign_dev_check_cb(dev, orig_dev)) { - struct net_device *br = netdev_master_upper_dev_get_rcu(dev); - - if (!br || !netif_is_bridge_master(br)) - return 0; - - /* No point in handling FDB entries on a foreign bridge */ - if (foreign_dev_check_cb(dev, br)) - return 0; - - return __switchdev_handle_fdb_del_to_device(br, orig_dev, - fdb_info, check_cb, - foreign_dev_check_cb, - del_cb, lag_del_cb); - } - + if (check_cb(dev)) return del_cb(dev, orig_dev, info->ctx, fdb_info); - } - /* If we passed over the foreign check, it means that the LAG interface - * is offloaded. - */ if (netif_is_lag_master(dev)) { + if (!switchdev_lower_dev_find(dev, check_cb, foreign_dev_check_cb)) + goto maybe_bridged_with_us; + + /* This is a LAG interface that we offload */ if (!lag_del_cb) return -EOPNOTSUPP; @@ -527,20 +568,49 @@ static int __switchdev_handle_fdb_del_to_device(struct net_device *dev, /* Recurse through lower interfaces in case the FDB entry is pointing * towards a bridge device. */ - netdev_for_each_lower_dev(dev, lower_dev, iter) { - /* Do not propagate FDB entries across bridges */ - if (netif_is_bridge_master(lower_dev)) - continue; + if (netif_is_bridge_master(dev)) { + if (!switchdev_lower_dev_find(dev, check_cb, foreign_dev_check_cb)) + return 0; + + /* This is a bridge interface that we offload */ + netdev_for_each_lower_dev(dev, lower_dev, iter) { + /* Do not propagate FDB entries across bridges */ + if (netif_is_bridge_master(lower_dev)) + continue; + + /* Bridge ports might be either us, or LAG interfaces + * that we offload. + */ + if (!check_cb(lower_dev) && + !switchdev_lower_dev_find(lower_dev, check_cb, + foreign_dev_check_cb)) + continue; + + err = __switchdev_handle_fdb_del_to_device(lower_dev, orig_dev, + fdb_info, check_cb, + foreign_dev_check_cb, + del_cb, lag_del_cb); + if (err && err != -EOPNOTSUPP) + return err; + } - err = __switchdev_handle_fdb_del_to_device(lower_dev, orig_dev, - fdb_info, check_cb, - foreign_dev_check_cb, - del_cb, lag_del_cb); - if (err && err != -EOPNOTSUPP) - return err; + return 0; } - return err; +maybe_bridged_with_us: + /* Event is neither on a bridge nor a LAG. Check whether it is on an + * interface that is in a bridge with us. + */ + br = netdev_master_upper_dev_get_rcu(dev); + if (!br || !netif_is_bridge_master(br)) + return 0; + + if (!switchdev_lower_dev_find(br, check_cb, foreign_dev_check_cb)) + return 0; + + return __switchdev_handle_fdb_del_to_device(br, orig_dev, fdb_info, + check_cb, foreign_dev_check_cb, + del_cb, lag_del_cb); } int switchdev_handle_fdb_del_to_device(struct net_device *dev, -- cgit v1.2.3 From 957e2235e5264c97cd6be8e2e17f2e11b41f2239 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Tue, 3 Aug 2021 23:34:08 +0300 Subject: net: make switchdev_bridge_port_{,unoffload} loosely coupled with the bridge With the introduction of explicit offloading API in switchdev in commit 2f5dc00f7a3e ("net: bridge: switchdev: let drivers inform which bridge ports are offloaded"), we started having Ethernet switch drivers calling directly into a function exported by net/bridge/br_switchdev.c, which is a function exported by the bridge driver. This means that drivers that did not have an explicit dependency on the bridge before, like cpsw and am65-cpsw, now do - otherwise it is not possible to call a symbol exported by a driver that can be built as module unless you are a module too. There was an attempt to solve the dependency issue in the form of commit b0e81817629a ("net: build all switchdev drivers as modules when the bridge is a module"). Grygorii Strashko, however, says about it: | In my opinion, the problem is a bit bigger here than just fixing the | build :( | | In case, of ^cpsw the switchdev mode is kinda optional and in many | cases (especially for testing purposes, NFS) the multi-mac mode is | still preferable mode. | | There were no such tight dependency between switchdev drivers and | bridge core before and switchdev serviced as independent, notification | based layer between them, so ^cpsw still can be "Y" and bridge can be | "M". Now for mostly every kernel build configuration the CONFIG_BRIDGE | will need to be set as "Y", or we will have to update drivers to | support build with BRIDGE=n and maintain separate builds for | networking vs non-networking testing. But is this enough? Wouldn't | it cause 'chain reaction' required to add more and more "Y" options | (like CONFIG_VLAN_8021Q)? | | PS. Just to be sure we on the same page - ARM builds will be forced | (with this patch) to have CONFIG_TI_CPSW_SWITCHDEV=m and so all our | automation testing will just fail with omap2plus_defconfig. In the light of this, it would be desirable for some configurations to avoid dependencies between switchdev drivers and the bridge, and have the switchdev mode as completely optional within the driver. Arnd Bergmann also tried to write a patch which better expressed the build time dependency for Ethernet switch drivers where the switchdev support is optional, like cpsw/am65-cpsw, and this made the drivers follow the bridge (compile as module if the bridge is a module) only if the optional switchdev support in the driver was enabled in the first place: https://patchwork.kernel.org/project/netdevbpf/patch/20210802144813.1152762-1-arnd@kernel.org/ but this still did not solve the fact that cpsw and am65-cpsw now must be built as modules when the bridge is a module - it just expressed correctly that optional dependency. But the new behavior is an apparent regression from Grygorii's perspective. So to support the use case where the Ethernet driver is built-in, NET_SWITCHDEV (a bool option) is enabled, and the bridge is a module, we need a framework that can handle the possible absence of the bridge from the running system, i.e. runtime bloatware as opposed to build-time bloatware. Luckily we already have this framework, since switchdev has been using it extensively. Events from the bridge side are transmitted to the driver side using notifier chains - this was originally done so that unrelated drivers could snoop for events emitted by the bridge towards ports that are implemented by other drivers (think of a switch driver with LAG offload that listens for switchdev events on a bonding/team interface that it offloads). There are also events which are transmitted from the driver side to the bridge side, which again are modeled using notifiers. SWITCHDEV_FDB_ADD_TO_BRIDGE is an example of this, and deals with notifying the bridge that a MAC address has been dynamically learned. So there is a precedent we can use for modeling the new framework. The difference compared to SWITCHDEV_FDB_ADD_TO_BRIDGE is that the work that the bridge needs to do when a port becomes offloaded is blocking in its nature: replay VLANs, MDBs etc. The calling context is indeed blocking (we are under rtnl_mutex), but the existing switchdev notification chain that the bridge is subscribed to is only the atomic one. So we need to subscribe the bridge to the blocking switchdev notification chain too. This patch: - keeps the driver-side perception of the switchdev_bridge_port_{,un}offload unchanged - moves the implementation of switchdev_bridge_port_{,un}offload from the bridge module into the switchdev module. - makes everybody that is subscribed to the switchdev blocking notifier chain "hear" offload & unoffload events - makes the bridge driver subscribe and handle those events - moves the bridge driver's handling of those events into 2 new functions called br_switchdev_port_{,un}offload. These functions contain in fact the core of the logic that was previously in switchdev_bridge_port_{,un}offload, just that now we go through an extra indirection layer to reach them. Unlike all the other switchdev notification structures, the structure used to carry the bridge port information, struct switchdev_notifier_brport_info, does not contain a "bool handled". This is because in the current usage pattern, we always know that a switchdev bridge port offloading event will be handled by the bridge, because the switchdev_bridge_port_offload() call was initiated by a NETDEV_CHANGEUPPER event in the first place, where info->upper_dev is a bridge. So if the bridge wasn't loaded, then the CHANGEUPPER event couldn't have happened. Signed-off-by: Vladimir Oltean Tested-by: Grygorii Strashko Signed-off-by: David S. Miller --- drivers/net/ethernet/ti/am65-cpsw-nuss.c | 2 +- drivers/net/ethernet/ti/cpsw_new.c | 2 +- include/linux/if_bridge.h | 35 ---------------------- include/net/switchdev.h | 46 ++++++++++++++++++++++++++++ net/bridge/br.c | 51 +++++++++++++++++++++++++++++++- net/bridge/br_private.h | 29 ++++++++++++++++++ net/bridge/br_switchdev.c | 36 ++++++---------------- net/switchdev/switchdev.c | 48 ++++++++++++++++++++++++++++++ 8 files changed, 184 insertions(+), 65 deletions(-) (limited to 'net/switchdev/switchdev.c') diff --git a/drivers/net/ethernet/ti/am65-cpsw-nuss.c b/drivers/net/ethernet/ti/am65-cpsw-nuss.c index 4f67d1a98c0d..fb5d2ac3f0d2 100644 --- a/drivers/net/ethernet/ti/am65-cpsw-nuss.c +++ b/drivers/net/ethernet/ti/am65-cpsw-nuss.c @@ -7,7 +7,6 @@ #include #include -#include #include #include #include @@ -28,6 +27,7 @@ #include #include #include +#include #include "cpsw_ale.h" #include "cpsw_sl.h" diff --git a/drivers/net/ethernet/ti/cpsw_new.c b/drivers/net/ethernet/ti/cpsw_new.c index b4f55ff4e84f..ae167223e87f 100644 --- a/drivers/net/ethernet/ti/cpsw_new.c +++ b/drivers/net/ethernet/ti/cpsw_new.c @@ -11,7 +11,6 @@ #include #include #include -#include #include #include #include @@ -29,6 +28,7 @@ #include #include +#include #include #include #include diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h index 21daed10322e..509e18c7e740 100644 --- a/include/linux/if_bridge.h +++ b/include/linux/if_bridge.h @@ -190,39 +190,4 @@ static inline clock_t br_get_ageing_time(const struct net_device *br_dev) } #endif -#if IS_ENABLED(CONFIG_BRIDGE) && IS_ENABLED(CONFIG_NET_SWITCHDEV) - -int switchdev_bridge_port_offload(struct net_device *brport_dev, - struct net_device *dev, const void *ctx, - struct notifier_block *atomic_nb, - struct notifier_block *blocking_nb, - bool tx_fwd_offload, - struct netlink_ext_ack *extack); -void switchdev_bridge_port_unoffload(struct net_device *brport_dev, - const void *ctx, - struct notifier_block *atomic_nb, - struct notifier_block *blocking_nb); - -#else - -static inline int -switchdev_bridge_port_offload(struct net_device *brport_dev, - struct net_device *dev, const void *ctx, - struct notifier_block *atomic_nb, - struct notifier_block *blocking_nb, - bool tx_fwd_offload, - struct netlink_ext_ack *extack) -{ - return -EINVAL; -} - -static inline void -switchdev_bridge_port_unoffload(struct net_device *brport_dev, - const void *ctx, - struct notifier_block *atomic_nb, - struct notifier_block *blocking_nb) -{ -} -#endif - #endif diff --git a/include/net/switchdev.h b/include/net/switchdev.h index 66468ff8cc0a..60d806b6a5ae 100644 --- a/include/net/switchdev.h +++ b/include/net/switchdev.h @@ -180,6 +180,14 @@ struct switchdev_obj_in_state_mrp { typedef int switchdev_obj_dump_cb_t(struct switchdev_obj *obj); +struct switchdev_brport { + struct net_device *dev; + const void *ctx; + struct notifier_block *atomic_nb; + struct notifier_block *blocking_nb; + bool tx_fwd_offload; +}; + enum switchdev_notifier_type { SWITCHDEV_FDB_ADD_TO_BRIDGE = 1, SWITCHDEV_FDB_DEL_TO_BRIDGE, @@ -197,6 +205,9 @@ enum switchdev_notifier_type { SWITCHDEV_VXLAN_FDB_ADD_TO_DEVICE, SWITCHDEV_VXLAN_FDB_DEL_TO_DEVICE, SWITCHDEV_VXLAN_FDB_OFFLOADED, + + SWITCHDEV_BRPORT_OFFLOADED, + SWITCHDEV_BRPORT_UNOFFLOADED, }; struct switchdev_notifier_info { @@ -226,6 +237,11 @@ struct switchdev_notifier_port_attr_info { bool handled; }; +struct switchdev_notifier_brport_info { + struct switchdev_notifier_info info; /* must be first */ + const struct switchdev_brport brport; +}; + static inline struct net_device * switchdev_notifier_info_to_dev(const struct switchdev_notifier_info *info) { @@ -246,6 +262,17 @@ switchdev_fdb_is_dynamically_learned(const struct switchdev_notifier_fdb_info *f #ifdef CONFIG_NET_SWITCHDEV +int switchdev_bridge_port_offload(struct net_device *brport_dev, + struct net_device *dev, const void *ctx, + struct notifier_block *atomic_nb, + struct notifier_block *blocking_nb, + bool tx_fwd_offload, + struct netlink_ext_ack *extack); +void switchdev_bridge_port_unoffload(struct net_device *brport_dev, + const void *ctx, + struct notifier_block *atomic_nb, + struct notifier_block *blocking_nb); + void switchdev_deferred_process(void); int switchdev_port_attr_set(struct net_device *dev, const struct switchdev_attr *attr, @@ -316,6 +343,25 @@ int switchdev_handle_port_attr_set(struct net_device *dev, struct netlink_ext_ack *extack)); #else +static inline int +switchdev_bridge_port_offload(struct net_device *brport_dev, + struct net_device *dev, const void *ctx, + struct notifier_block *atomic_nb, + struct notifier_block *blocking_nb, + bool tx_fwd_offload, + struct netlink_ext_ack *extack) +{ + return -EOPNOTSUPP; +} + +static inline void +switchdev_bridge_port_unoffload(struct net_device *brport_dev, + const void *ctx, + struct notifier_block *atomic_nb, + struct notifier_block *blocking_nb) +{ +} + static inline void switchdev_deferred_process(void) { } diff --git a/net/bridge/br.c b/net/bridge/br.c index 8fb5dca5f8e0..d3a32c6813e0 100644 --- a/net/bridge/br.c +++ b/net/bridge/br.c @@ -201,6 +201,48 @@ static struct notifier_block br_switchdev_notifier = { .notifier_call = br_switchdev_event, }; +/* called under rtnl_mutex */ +static int br_switchdev_blocking_event(struct notifier_block *nb, + unsigned long event, void *ptr) +{ + struct netlink_ext_ack *extack = netdev_notifier_info_to_extack(ptr); + struct net_device *dev = switchdev_notifier_info_to_dev(ptr); + struct switchdev_notifier_brport_info *brport_info; + const struct switchdev_brport *b; + struct net_bridge_port *p; + int err = NOTIFY_DONE; + + p = br_port_get_rtnl(dev); + if (!p) + goto out; + + switch (event) { + case SWITCHDEV_BRPORT_OFFLOADED: + brport_info = ptr; + b = &brport_info->brport; + + err = br_switchdev_port_offload(p, b->dev, b->ctx, + b->atomic_nb, b->blocking_nb, + b->tx_fwd_offload, extack); + err = notifier_from_errno(err); + break; + case SWITCHDEV_BRPORT_UNOFFLOADED: + brport_info = ptr; + b = &brport_info->brport; + + br_switchdev_port_unoffload(p, b->ctx, b->atomic_nb, + b->blocking_nb); + break; + } + +out: + return err; +} + +static struct notifier_block br_switchdev_blocking_notifier = { + .notifier_call = br_switchdev_blocking_event, +}; + /* br_boolopt_toggle - change user-controlled boolean option * * @br: bridge device @@ -355,10 +397,14 @@ static int __init br_init(void) if (err) goto err_out4; - err = br_netlink_init(); + err = register_switchdev_blocking_notifier(&br_switchdev_blocking_notifier); if (err) goto err_out5; + err = br_netlink_init(); + if (err) + goto err_out6; + brioctl_set(br_ioctl_stub); #if IS_ENABLED(CONFIG_ATM_LANE) @@ -373,6 +419,8 @@ static int __init br_init(void) return 0; +err_out6: + unregister_switchdev_blocking_notifier(&br_switchdev_blocking_notifier); err_out5: unregister_switchdev_notifier(&br_switchdev_notifier); err_out4: @@ -392,6 +440,7 @@ static void __exit br_deinit(void) { stp_proto_unregister(&br_stp_proto); br_netlink_fini(); + unregister_switchdev_blocking_notifier(&br_switchdev_blocking_notifier); unregister_switchdev_notifier(&br_switchdev_notifier); unregister_netdevice_notifier(&br_device_notifier); brioctl_set(NULL); diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h index c939631428b9..10d43bf4bb80 100644 --- a/net/bridge/br_private.h +++ b/net/bridge/br_private.h @@ -1880,6 +1880,17 @@ static inline void br_sysfs_delbr(struct net_device *dev) { return; } /* br_switchdev.c */ #ifdef CONFIG_NET_SWITCHDEV +int br_switchdev_port_offload(struct net_bridge_port *p, + struct net_device *dev, const void *ctx, + struct notifier_block *atomic_nb, + struct notifier_block *blocking_nb, + bool tx_fwd_offload, + struct netlink_ext_ack *extack); + +void br_switchdev_port_unoffload(struct net_bridge_port *p, const void *ctx, + struct notifier_block *atomic_nb, + struct notifier_block *blocking_nb); + bool br_switchdev_frame_uses_tx_fwd_offload(struct sk_buff *skb); void br_switchdev_frame_set_offload_fwd_mark(struct sk_buff *skb); @@ -1908,6 +1919,24 @@ static inline void br_switchdev_frame_unmark(struct sk_buff *skb) skb->offload_fwd_mark = 0; } #else +static inline int +br_switchdev_port_offload(struct net_bridge_port *p, + struct net_device *dev, const void *ctx, + struct notifier_block *atomic_nb, + struct notifier_block *blocking_nb, + bool tx_fwd_offload, + struct netlink_ext_ack *extack) +{ + return -EOPNOTSUPP; +} + +static inline void +br_switchdev_port_unoffload(struct net_bridge_port *p, const void *ctx, + struct notifier_block *atomic_nb, + struct notifier_block *blocking_nb) +{ +} + static inline bool br_switchdev_frame_uses_tx_fwd_offload(struct sk_buff *skb) { return false; diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c index 36d75fd4a80c..6bf518d78f02 100644 --- a/net/bridge/br_switchdev.c +++ b/net/bridge/br_switchdev.c @@ -312,23 +312,16 @@ static void nbp_switchdev_unsync_objs(struct net_bridge_port *p, /* Let the bridge know that this port is offloaded, so that it can assign a * switchdev hardware domain to it. */ -int switchdev_bridge_port_offload(struct net_device *brport_dev, - struct net_device *dev, const void *ctx, - struct notifier_block *atomic_nb, - struct notifier_block *blocking_nb, - bool tx_fwd_offload, - struct netlink_ext_ack *extack) +int br_switchdev_port_offload(struct net_bridge_port *p, + struct net_device *dev, const void *ctx, + struct notifier_block *atomic_nb, + struct notifier_block *blocking_nb, + bool tx_fwd_offload, + struct netlink_ext_ack *extack) { struct netdev_phys_item_id ppid; - struct net_bridge_port *p; int err; - ASSERT_RTNL(); - - p = br_port_get_rtnl(brport_dev); - if (!p) - return -ENODEV; - err = dev_get_port_parent_id(dev, &ppid, false); if (err) return err; @@ -348,23 +341,12 @@ out_switchdev_del: return err; } -EXPORT_SYMBOL_GPL(switchdev_bridge_port_offload); -void switchdev_bridge_port_unoffload(struct net_device *brport_dev, - const void *ctx, - struct notifier_block *atomic_nb, - struct notifier_block *blocking_nb) +void br_switchdev_port_unoffload(struct net_bridge_port *p, const void *ctx, + struct notifier_block *atomic_nb, + struct notifier_block *blocking_nb) { - struct net_bridge_port *p; - - ASSERT_RTNL(); - - p = br_port_get_rtnl(brport_dev); - if (!p) - return; - nbp_switchdev_unsync_objs(p, ctx, atomic_nb, blocking_nb); nbp_switchdev_del(p); } -EXPORT_SYMBOL_GPL(switchdev_bridge_port_unoffload); diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c index 0ae3478561f4..0b2c18efc079 100644 --- a/net/switchdev/switchdev.c +++ b/net/switchdev/switchdev.c @@ -809,3 +809,51 @@ int switchdev_handle_port_attr_set(struct net_device *dev, return err; } EXPORT_SYMBOL_GPL(switchdev_handle_port_attr_set); + +int switchdev_bridge_port_offload(struct net_device *brport_dev, + struct net_device *dev, const void *ctx, + struct notifier_block *atomic_nb, + struct notifier_block *blocking_nb, + bool tx_fwd_offload, + struct netlink_ext_ack *extack) +{ + struct switchdev_notifier_brport_info brport_info = { + .brport = { + .dev = dev, + .ctx = ctx, + .atomic_nb = atomic_nb, + .blocking_nb = blocking_nb, + .tx_fwd_offload = tx_fwd_offload, + }, + }; + int err; + + ASSERT_RTNL(); + + err = call_switchdev_blocking_notifiers(SWITCHDEV_BRPORT_OFFLOADED, + brport_dev, &brport_info.info, + extack); + return notifier_to_errno(err); +} +EXPORT_SYMBOL_GPL(switchdev_bridge_port_offload); + +void switchdev_bridge_port_unoffload(struct net_device *brport_dev, + const void *ctx, + struct notifier_block *atomic_nb, + struct notifier_block *blocking_nb) +{ + struct switchdev_notifier_brport_info brport_info = { + .brport = { + .ctx = ctx, + .atomic_nb = atomic_nb, + .blocking_nb = blocking_nb, + }, + }; + + ASSERT_RTNL(); + + call_switchdev_blocking_notifiers(SWITCHDEV_BRPORT_UNOFFLOADED, + brport_dev, &brport_info.info, + NULL); +} +EXPORT_SYMBOL_GPL(switchdev_bridge_port_unoffload); -- cgit v1.2.3