summaryrefslogtreecommitdiff
path: root/net/bridge/br_vlan.c
AgeCommit message (Collapse)AuthorFilesLines
2021-07-23net: bridge: switchdev: allow the TX data plane forwarding to be offloadedTobias Waldekranz1-1/+9
Allow switchdevs to forward frames from the CPU in accordance with the bridge configuration in the same way as is done between bridge ports. This means that the bridge will only send a single skb towards one of the ports under the switchdev's control, and expects the driver to deliver the packet to all eligible ports in its domain. Primarily this improves the performance of multicast flows with multiple subscribers, as it allows the hardware to perform the frame replication. The basic flow between the driver and the bridge is as follows: - When joining a bridge port, the switchdev driver calls switchdev_bridge_port_offload() with tx_fwd_offload = true. - The bridge sends offloadable skbs to one of the ports under the switchdev's control using skb->offload_fwd_mark = true. - The switchdev driver checks the skb->offload_fwd_mark field and lets its FDB lookup select the destination port mask for this packet. v1->v2: - convert br_input_skb_cb::fwd_hwdoms to a plain unsigned long - introduce a static key "br_switchdev_fwd_offload_used" to minimize the impact of the newly introduced feature on all the setups which don't have hardware that can make use of it - introduce a check for nbp->flags & BR_FWD_OFFLOAD to optimize cache line access - reorder nbp_switchdev_frame_mark_accel() and br_handle_vlan() in __br_forward() - do not strip VLAN on egress if forwarding offload on VLAN-aware bridge is being used - propagate errors from .ndo_dfwd_add_station() if not EOPNOTSUPP v2->v3: - replace the solution based on .ndo_dfwd_add_station with a solution based on switchdev_bridge_port_offload - rename BR_FWD_OFFLOAD to BR_TX_FWD_OFFLOAD v3->v4: rebase v4->v5: - make sure the static key is decremented on bridge port unoffload - more function and variable renaming and comments for them: br_switchdev_fwd_offload_used to br_switchdev_tx_fwd_offload br_switchdev_accels_skb to br_switchdev_frame_uses_tx_fwd_offload nbp_switchdev_frame_mark_tx_fwd to nbp_switchdev_frame_mark_tx_fwd_to_hwdom nbp_switchdev_frame_mark_accel to nbp_switchdev_frame_mark_tx_fwd_offload fwd_accel to tx_fwd_offload Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-22net: bridge: move the switchdev object replay helpers to "push" modeVladimir Oltean1-1/+0
Starting with commit 4f2673b3a2b6 ("net: bridge: add helper to replay port and host-joined mdb entries"), DSA has introduced some bridge helpers that replay switchdev events (FDB/MDB/VLAN additions and deletions) that can be lost by the switchdev drivers in a variety of circumstances: - an IP multicast group was host-joined on the bridge itself before any switchdev port joined the bridge, leading to the host MDB entries missing in the hardware database. - during the bridge creation process, the MAC address of the bridge was added to the FDB as an entry pointing towards the bridge device itself, but with no switchdev ports being part of the bridge yet, this local FDB entry would remain unknown to the switchdev hardware database. - a VLAN/FDB/MDB was added to a bridge port that is a LAG interface, before any switchdev port joined that LAG, leading to the hardware database missing those entries. - a switchdev port left a LAG that is a bridge port, while the LAG remained part of the bridge, and all FDB/MDB/VLAN entries remained installed in the hardware database of the switchdev port. Also, since commit 0d2cfbd41c4a ("net: bridge: ignore switchdev events for LAG ports which didn't request replay"), DSA introduced a method, based on a const void *ctx, to ensure that two switchdev ports under the same LAG that is a bridge port do not see the same MDB/VLAN entry being replayed twice by the bridge, once for every bridge port that joins the LAG. With so many ordering corner cases being possible, it seems unreasonable to expect a switchdev driver writer to get it right from the first try. Therefore, now that DSA has experimented with the bridge replay helpers for a little bit, we can move the code to the bridge driver where it is more readily available to all switchdev drivers. To convert the switchdev object replay helpers from "pull mode" (where the driver asks for them) to a "push mode" (where the bridge offers them automatically), the biggest problem is that the bridge needs to be aware when a switchdev port joins and leaves, even when the switchdev is only indirectly a bridge port (for example when the bridge port is a LAG upper of the switchdev). Luckily, we already have a hook for that, in the form of the newly introduced switchdev_bridge_port_offload() and switchdev_bridge_port_unoffload() calls. These offer a natural place for hooking the object addition and deletion replays. Extend the above 2 functions with: - pointers to the switchdev atomic notifier (for FDB replays) and the blocking notifier (for MDB and VLAN replays). - the "const void *ctx" argument required for drivers to be able to disambiguate between which port is targeted, when multiple ports are lowers of the same LAG that is a bridge port. Most of the drivers pass NULL to this argument, except the ones that support LAG offload and have the proper context check already in place in the switchdev blocking notifier handler. Also unexport the replay helpers, since nobody except the bridge calls them directly now. Note that: (a) we abuse the terminology slightly, because FDB entries are not "switchdev objects", but we count them as objects nonetheless. With no direct way to prove it, I think they are not modeled as switchdev objects because those can only be installed by the bridge to the hardware (as opposed to FDB entries which can be propagated in the other direction too). This is merely an abuse of terms, FDB entries are replayed too, despite not being objects. (b) the bridge does not attempt to sync port attributes to newly joined ports, just the countable stuff (the objects). The reason for this is simple: no universal and symmetric way to sync and unsync them is known. For example, VLAN filtering: what to do on unsync, disable or leave it enabled? Similarly, STP state, ageing timer, etc etc. What a switchdev port does when it becomes standalone again is not really up to the bridge's competence, and the driver should deal with it. On the other hand, replaying deletions of switchdev objects can be seen a matter of cleanup and therefore be treated by the bridge, hence this patch. We make the replay helpers opt-in for drivers, because they might not bring immediate benefits for them: - nbp_vlan_init() is called _after_ netdev_master_upper_dev_link(), so br_vlan_replay() should not do anything for the new drivers on which we call it. The existing drivers where there was even a slight possibility for there to exist a VLAN on a bridge port before they join it are already guarded against this: mlxsw and prestera deny joining LAG interfaces that are members of a bridge. - br_fdb_replay() should now notify of local FDB entries, but I patched all drivers except DSA to ignore these new entries in commit 2c4eca3ef716 ("net: bridge: switchdev: include local flag in FDB notifications"). Driver authors can lift this restriction as they wish, and when they do, they can also opt into the FDB replay functionality. - br_mdb_replay() should fix a real issue which is described in commit 4f2673b3a2b6 ("net: bridge: add helper to replay port and host-joined mdb entries"). However most drivers do not offload the SWITCHDEV_OBJ_ID_HOST_MDB to see this issue: only cpsw and am65_cpsw offload this switchdev object, and I don't completely understand the way in which they offload this switchdev object anyway. So I'll leave it up to these drivers' respective maintainers to opt into br_mdb_replay(). So most of the drivers pass NULL notifier blocks for the replay helpers, except: - dpaa2-switch which was already acked/regression-tested with the helpers enabled (and there isn't much of a downside in having them) - ocelot which already had replay logic in "pull" mode - DSA which already had replay logic in "pull" mode An important observation is that the drivers which don't currently request bridge event replays don't even have the switchdev_bridge_port_{offload,unoffload} calls placed in proper places right now. This was done to avoid unnecessary rework for drivers which might never even add support for this. For driver writers who wish to add replay support, this can be used as a tentative placement guide: https://patchwork.kernel.org/project/netdevbpf/patch/20210720134655.892334-11-vladimir.oltean@nxp.com/ Cc: Vadym Kochan <vkochan@marvell.com> Cc: Taras Chornyi <tchornyi@marvell.com> Cc: Ioana Ciornei <ioana.ciornei@nxp.com> Cc: Lars Povlsen <lars.povlsen@microchip.com> Cc: Steen Hegelund <Steen.Hegelund@microchip.com> Cc: UNGLinuxDriver@microchip.com Cc: Claudiu Manoil <claudiu.manoil@nxp.com> Cc: Alexandre Belloni <alexandre.belloni@bootlin.com> Cc: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Ioana Ciornei <ioana.ciornei@nxp.com> # dpaa2-switch Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-22net: bridge: guard the switchdev replay helpers against a NULL notifier blockVladimir Oltean1-0/+3
There is a desire to make the object and FDB replay helpers optional when moving them inside the bridge driver. For example a certain driver might not offload host MDBs and there is no case where the replay helpers would be of immediate use to it. So it would be nice if we could allow drivers to pass NULL pointers for the atomic and blocking notifier blocks, and the replay helpers to do nothing in that case. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-20net: bridge: vlan: add support for dumping global vlan optionsNikolay Aleksandrov1-8/+33
Add a new vlan options dump flag which causes only global vlan options to be dumped. The dumps are done only with bridge devices, ports are ignored. They support vlan compression if the options in sequential vlans are equal (currently always true). Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-20net: bridge: vlan: add support for global optionsNikolay Aleksandrov1-3/+13
We can have two types of vlan options depending on context: - per-device vlan options (split in per-bridge and per-port) - global vlan options The second type wasn't supported in the bridge until now, but we need them for per-vlan multicast support, per-vlan STP support and other options which require global vlan context. They are contained in the global bridge vlan context even if the vlan is not configured on the bridge device itself. This patch adds initial netlink attributes and support for setting these global vlan options, they can only be set (RTM_NEWVLAN) and the operation must use the bridge device. Since there are no such options yet it shouldn't have any functional effect. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-20net: bridge: add vlan mcast snooping knobNikolay Aleksandrov1-5/+15
Add a global knob that controls if vlan multicast snooping is enabled. The proper contexts (vlan or bridge-wide) will be chosen based on the knob when processing packets and changing bridge device state. Note that vlans have their individual mcast snooping enabled by default, but this knob is needed to turn on bridge vlan snooping. It is disabled by default. To enable the knob vlan filtering must also be enabled, it doesn't make sense to have vlan mcast snooping without vlan filtering since that would lead to inconsistencies. Disabling vlan filtering will also automatically disable vlan mcast snooping. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-20net: bridge: multicast: add vlan state initialization and controlNikolay Aleksandrov1-0/+4
Add helpers to enable/disable vlan multicast based on its flags, we need two flags because we need to know if the vlan has multicast enabled globally (user-controlled) and if it has it enabled on the specific device (bridge or port). The new private vlan flags are: - BR_VLFLAG_MCAST_ENABLED: locally enabled multicast on the device, used when removing a vlan, toggling vlan mcast snooping and controlling single vlan (kernel-controlled, valid under RTNL and multicast_lock) - BR_VLFLAG_GLOBAL_MCAST_ENABLED: globally enabled multicast for the vlan, used to control the bridge-wide vlan mcast snooping for a single vlan (user-controlled, can be checked under any context) Bridge vlan contexts are created with multicast snooping enabled by default to be in line with the current bridge snooping defaults. In order to actually activate per vlan snooping and context usage a bridge-wide knob will be added later which will default to disabled. If that knob is enabled then automatically all vlan snooping will be enabled. All vlan contexts are initialized with the current bridge multicast context defaults. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-20net: bridge: vlan: add global and per-port multicast contextNikolay Aleksandrov1-0/+4
Add global and per-port vlan multicast context, only initialized but still not used. No functional changes intended. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-29net: bridge: allow the switchdev replay functions to be called for deletionVladimir Oltean1-4/+11
When a switchdev port leaves a LAG that is a bridge port, the switchdev objects and port attributes offloaded to that port are not removed: ip link add br0 type bridge ip link add bond0 type bond mode 802.3ad ip link set swp0 master bond0 ip link set bond0 master br0 bridge vlan add dev bond0 vid 100 ip link set swp0 nomaster VLAN 100 will remain installed on swp0 despite it going into standalone mode, because as far as the bridge is concerned, nothing ever happened to its bridge port. Let's extend the bridge vlan, fdb and mdb replay functions to take a 'bool adding' argument, and make DSA and ocelot call the replay functions with 'adding' as false from the switchdev unsync path, for the switch port that leaves the bridge. Note that this patch in itself does not salvage anything, because in the current pull mode of operation, DSA still needs to call the replay helpers with adding=false. This will be done in another patch. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-29net: bridge: ignore switchdev events for LAG ports which didn't request replayVladimir Oltean1-3/+5
There is a slight inconvenience in the switchdev replay helpers added recently, and this is when: ip link add br0 type bridge ip link add bond0 type bond ip link set bond0 master br0 bridge vlan add dev bond0 vid 100 ip link set swp0 master bond0 ip link set swp1 master bond0 Since the underlying driver (currently only DSA) asks for a replay of VLANs when swp0 and swp1 join the LAG because it is bridged, what will happen is that DSA will try to react twice on the VLAN event for swp0. This is not really a huge problem right now, because most drivers accept duplicates since the bridge itself does, but it will become a problem when we add support for replaying switchdev object deletions. Let's fix this by adding a blank void *ctx in the replay helpers, which will be passed on by the bridge in the switchdev notifications. If the context is NULL, everything is the same as before. But if the context is populated with a valid pointer, the underlying switchdev driver (currently DSA) can use the pointer to 'see through' the bridge port (which in the example above is bond0) and 'know' that the event is only for a particular physical port offloading that bridge port, and not for all of them. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-18net: bridge: remove redundant continue statementColin Ian King1-3/+1
The continue statement at the end of a for-loop has no effect, invert the if expression and remove the continue. Addresses-Coverity: ("Continue has no effect") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-24netfilter: flowtable: bridge vlan hardware offload and switchdevFelix Fietkau1-0/+2
The switch might have already added the VLAN tag through PVID hardware offload. Keep this extra VLAN in the flowtable but skip it on egress. Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-24net: bridge: resolve forwarding path for VLAN tag actions in bridge devicesFelix Fietkau1-0/+53
Depending on the VLAN settings of the bridge and the port, the bridge can either add or remove a tag. When vlan filtering is enabled, the fdb lookup also needs to know the VLAN tag/proto for the destination address To provide this, keep track of the stack of VLAN tags for the path in the lookup context Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-24net: bridge: Fix missing return assignment from br_vlan_replay_one callColin Ian King1-1/+1
The call to br_vlan_replay_one is returning an error return value but this is not being assigned to err and the following check on err is currently always false because err was initialized to zero. Fix this by assigning err. Addresses-Coverity: ("'Constant' variable guards dead code") Fixes: 22f67cdfae6a ("net: bridge: add helper to replay VLANs installed on port") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-24net: bridge: add helper to replay VLANs installed on portVladimir Oltean1-0/+73
Currently this simple setup with DSA: ip link add br0 type bridge vlan_filtering 1 ip link add bond0 type bond ip link set bond0 master br0 ip link set swp0 master bond0 will not work because the bridge has created the PVID in br_add_if -> nbp_vlan_init, and it has notified switchdev of the existence of VLAN 1, but that was too early, since swp0 was not yet a lower of bond0, so it had no reason to act upon that notification. We need a helper in the bridge to replay the switchdev VLAN objects that were notified since the bridge port creation, because some of them may have been missed. As opposed to the br_mdb_replay function, the vg->vlan_list write side protection is offered by the rtnl_mutex which is sleepable, so we don't need to queue up the objects in atomic context, we can replay them right away. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-15net: bridge: propagate extack through switchdev_port_attr_setVladimir Oltean1-6/+7
The benefit is the ability to propagate errors from switchdev drivers for the SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING and SWITCHDEV_ATTR_ID_BRIDGE_VLAN_PROTOCOL attributes. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-15net: bridge: propagate extack through store_bridge_parmVladimir Oltean1-4/+7
The bridge sysfs interface stores parameters for the STP, VLAN, multicast etc subsystems using a predefined function prototype. Sometimes the underlying function being called supports a netlink extended ack message, and we ignore it. Let's expand the store_bridge_parm function prototype to include the extack, and just print it to console, but at least propagate it where applicable. Where not applicable, create a shim function in the br_sysfs_br.c file that discards the extra function argument. This patch allows us to propagate the extack argument to br_vlan_set_default_pvid, br_vlan_set_proto and br_vlan_filter_toggle, and from there, further up in br_changelink from br_netlink.c. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-15net: bridge: remove __br_vlan_filter_toggleVladimir Oltean1-6/+1
This function is identical with br_vlan_filter_toggle. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-01-19net: bridge: check vlan with eth_type_vlan() methodMenglong Dong1-1/+1
Replace some checks for ETH_P_8021Q and ETH_P_8021AD with eth_type_vlan(). Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/20210117080950.122761-1-dong.menglong@zte.com.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-1/+3
xdp_return_frame_bulk() needs to pass a xdp_buff to __xdp_return(). strlcpy got converted to strscpy but here it makes no functional difference, so just keep the right code. Conflicts: net/netfilter/nf_tables_api.c Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-05net: bridge: vlan: fix error return code in __vlan_add()Zhang Changzhong1-1/+3
Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Fixes: f8ed289fab84 ("bridge: vlan: use br_vlan_(get|put)_master to deal with refcounts") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/1607071737-33875-1-git-send-email-zhangchangzhong@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-02bridge: switchdev: Notify about VLAN protocol changesDanielle Ratson1-2/+14
Drivers that support bridge offload need to be notified about changes to the bridge's VLAN protocol so that they could react accordingly and potentially veto the change. Add a new switchdev attribute to communicate the change to drivers. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-19net: bridge: replace struct br_vlan_stats with pcpu_sw_netstatsHeiner Kallweit1-7/+8
Struct br_vlan_stats duplicates pcpu_sw_netstats (apart from br_vlan_stats not defining an alignment requirement), therefore switch to using the latter one. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/04d25c3d-c5f6-3611-6d37-c2f40243dae2@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller1-7/+13
Rejecting non-native endian BTF overlapped with the addition of support for it. The rest were more simple overlapping changes, except the renesas ravb binding update, which had to follow a file move as well as a YAML conversion. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-29net: core: introduce struct netdev_nested_priv for nested interface ↵Taehee Yoo1-7/+13
infrastructure Functions related to nested interface infrastructure such as netdev_walk_all_{ upper | lower }_dev() pass both private functions and "data" pointer to handle their own things. At this point, the data pointer type is void *. In order to make it easier to expand common variables and functions, this new netdev_nested_priv structure is added. In the following patch, a new member variable will be added into this struct to fix the lockdep issue. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller1-10/+17
Two minor conflicts: 1) net/ipv4/route.c, adding a new local variable while moving another local variable and removing it's initial assignment. 2) drivers/net/dsa/microchip/ksz9477.c, overlapping changes. One pretty prints the port mode differently, whilst another changes the driver to try and obtain the port mode from the port node rather than the switch node. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-22net: bridge: br_vlan_get_pvid_rcu() should dereference the VLAN group under RCUVladimir Oltean1-10/+17
When calling the RCU brother of br_vlan_get_pvid(), lockdep warns: ============================= WARNING: suspicious RCU usage 5.9.0-rc3-01631-g13c17acb8e38-dirty #814 Not tainted ----------------------------- net/bridge/br_private.h:1054 suspicious rcu_dereference_protected() usage! Call trace: lockdep_rcu_suspicious+0xd4/0xf8 __br_vlan_get_pvid+0xc0/0x100 br_vlan_get_pvid_rcu+0x78/0x108 The warning is because br_vlan_get_pvid_rcu() calls nbp_vlan_group() which calls rtnl_dereference() instead of rcu_dereference(). In turn, rtnl_dereference() calls rcu_dereference_protected() which assumes operation under an RCU write-side critical section, which obviously is not the case here. So, when the incorrect primitive is used to access the RCU-protected VLAN group pointer, READ_ONCE() is not used, which may cause various unexpected problems. I'm sad to say that br_vlan_get_pvid() and br_vlan_get_pvid_rcu() cannot share the same implementation. So fix the bug by splitting the 2 functions, and making br_vlan_get_pvid_rcu() retrieve the VLAN groups under proper locking annotations. Fixes: 7582f5b70f9a ("bridge: add br_vlan_get_pvid_rcu()") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-19net: bridge: delete duplicated wordsRandy Dunlap1-1/+1
Drop repeated words in net/bridge/. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Roopa Prabhu <roopa@nvidia.com> Cc: Nikolay Aleksandrov <nikolay@nvidia.com> Cc: bridge@lists.linux-foundation.org Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-18netlink: consistently use NLA_POLICY_EXACT_LEN()Johannes Berg1-2/+2
Change places that open-code NLA_POLICY_EXACT_LEN() to use the macro instead, giving us flexibility in how we handle the details of the macro. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-20net: bridge: vlan options: nest the tunnel id into a tunnel info attributeNikolay Aleksandrov1-1/+1
While discussing the new API, Roopa mentioned that we'll be adding more tunnel attributes and options in the future, so it's better to make it a nested attribute, since this is still in net-next we can easily change it and nest the tunnel id attribute under BRIDGE_VLANDB_ENTRY_TUNNEL_INFO. The new format is: [BRIDGE_VLANDB_ENTRY] [BRIDGE_VLANDB_ENTRY_TUNNEL_INFO] [BRIDGE_VLANDB_TINFO_ID] Any new tunnel attributes can be nested under BRIDGE_VLANDB_ENTRY_TUNNEL_INFO. Suggested-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-20net: bridge: vlan: include stats in dumps if requestedNikolay Aleksandrov1-12/+61
This patch adds support for vlan stats to be included when dumping vlan information. We have to dump them only when explicitly requested (thus the flag below) because that disables the vlan range compression and will make the dump significantly larger. In order to request the stats to be included we add a new dump attribute called BRIDGE_VLANDB_DUMP_FLAGS which can affect dumps with the following first flag: - BRIDGE_VLANDB_DUMPF_STATS The stats are intentionally nested and put into separate attributes to make it easier for extending later since we plan to add per-vlan mcast stats, drop stats and possibly STP stats. This is the last missing piece from the new vlan API which makes the dumped vlan information complete. A dump request which should include stats looks like: [BRIDGE_VLANDB_DUMP_FLAGS] |= BRIDGE_VLANDB_DUMPF_STATS A vlandb entry attribute with stats looks like: [BRIDGE_VLANDB_ENTRY] = { [BRIDGE_VLANDB_ENTRY_STATS] = { [BRIDGE_VLANDB_STATS_RX_BYTES] [BRIDGE_VLANDB_STATS_RX_PACKETS] ... } } Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-18net: bridge: vlan options: add support for tunnel mapping set/delNikolay Aleksandrov1-0/+1
This patch adds support for manipulating vlan/tunnel mappings. The tunnel ids are globally unique and are one per-vlan. There were two trickier issues - first in order to support vlan ranges we have to compute the current tunnel id in the following way: - base tunnel id (attr) + current vlan id - starting vlan id This is in line how the old API does vlan/tunnel mapping with ranges. We already have the vlan range present, so it's redundant to add another attribute for the tunnel range end. It's simply base tunnel id + vlan range. And second to support removing mappings we need an out-of-band way to tell the option manipulating function because there are no special/reserved tunnel id values, so we use a vlan flag to denote the operation is tunnel mapping removal. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-18net: bridge: vlan options: rename br_vlan_opts_eq to br_vlan_opts_eq_rangeNikolay Aleksandrov1-1/+1
It is more appropriate name as it shows the intent of why we need to check the options' state. It also allows us to give meaning to the two arguments of the function: the first is the current vlan (v_curr) being checked if it could enter the range ending in the second one (range_end). Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24net: bridge: vlan: add per-vlan stateNikolay Aleksandrov1-12/+35
The first per-vlan option added is state, it is needed for EVPN and for per-vlan STP. The state allows to control the forwarding on per-vlan basis. The vlan state is considered only if the port state is forwarding in order to avoid conflicts and be consistent. br_allowed_egress is called only when the state is forwarding, but the ingress case is a bit more complicated due to the fact that we may have the transition between port:BR_STATE_FORWARDING -> vlan:BR_STATE_LEARNING which should still allow the bridge to learn from the packet after vlan filtering and it will be dropped after that. Also to optimize the pvid state check we keep a copy in the vlan group to avoid one lookup. The state members are modified with *_ONCE() to annotate the lockless access. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24net: bridge: vlan: add basic option setting supportNikolay Aleksandrov1-7/+34
This patch adds support for option modification of single vlans and ranges. It allows to only modify options, i.e. skip create/delete by using the BRIDGE_VLAN_INFO_ONLY_OPTS flag. When working with a range option changes we try to pack the notifications as much as possible. v2: do full port (all vlans) notification only when creating/deleting vlans for compatibility, rework the range detection when changing options, add more verbose extack errors and check if a vlan should be used (br_vlan_should_use checks) Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24net: bridge: vlan: add basic option dumping supportNikolay Aleksandrov1-6/+14
We'll be dumping the options for the whole range if they're equal. The first range vlan will be used to extract the options. The commit doesn't change anything yet it just adds the skeleton for the support. The dump will happen when the first option is added. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15net: bridge: vlan: notify on vlan add/delete/change flagsNikolay Aleksandrov1-15/+56
Now that we can notify, send a notification on add/del or change of flags. Notifications are also compressed when possible to reduce their number and relieve user-space of extra processing, due to that we have to manually notify after each add/del in order to avoid double notifications. We try hard to notify only about the vlans which actually changed, thus a single command can result in multiple notifications about disjoint ranges if there were vlans which didn't change inside. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15net: bridge: vlan: add rtnetlink group and notify supportNikolay Aleksandrov1-0/+79
Add a new rtnetlink group for bridge vlan notifications - RTNLGRP_BRVLAN and add support for sending vlan notifications (both single and ranges). No functional changes intended, the notification support will be used by later patches. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15net: bridge: vlan: add rtm range supportNikolay Aleksandrov1-14/+72
Add a new vlandb nl attribute - BRIDGE_VLANDB_ENTRY_RANGE which causes RTM_NEWVLAN/DELVAN to act on a range. Dumps now automatically compress similar vlans into ranges. This will be also used when per-vlan options are introduced and vlans' options match, they will be put into a single range which is encapsulated in one netlink attribute. We need to run similar checks as br_process_vlan_info() does because these ranges will be used for options setting and they'll be able to skip br_process_vlan_info(). Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15net: bridge: vlan: add del rtm message supportNikolay Aleksandrov1-0/+6
Adding RTM_DELVLAN support similar to RTM_NEWVLAN is simple, just need to map DELVLAN to DELLINK and register the handler. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15net: bridge: vlan: add new rtm message supportNikolay Aleksandrov1-0/+111
Add initial RTM_NEWVLAN support which can only create vlans, operating similar to the current br_afspec(). We will use it later to also change per-vlan options. Old-style (flag-based) vlan ranges are not allowed when using RTM messages, we will introduce vlan ranges later via a new nested attribute which would allow us to have all the information about a range encapsulated into a single nl attribute. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15net: bridge: vlan: add rtm definitions and dump supportNikolay Aleksandrov1-0/+148
This patch adds vlan rtm definitions: - NEWVLAN: to be used for creating vlans, setting options and notifications - DELVLAN: to be used for deleting vlans - GETVLAN: used for dumping vlan information Dumping vlans which can span multiple messages is added now with basic information (vid and flags). We use nlmsg_parse() to validate the header length in order to be able to extend the message with filtering attributes later. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-31net: bridge: Populate the pvid flag in br_vlan_get_infoVladimir Oltean1-0/+2
Currently this simplified code snippet fails: br_vlan_get_pvid(netdev, &pvid); br_vlan_get_info(netdev, pvid, &vinfo); ASSERT(!(vinfo.flags & BRIDGE_VLAN_INFO_PVID)); It is intuitive that the pvid of a netdevice should have the BRIDGE_VLAN_INFO_PVID flag set. However I can't seem to pinpoint a commit where this behavior was introduced. It seems like it's been like that since forever. At a first glance it would make more sense to just handle the BRIDGE_VLAN_INFO_PVID flag in __vlan_add_flags. However, as Nikolay explains: There are a few reasons why we don't do it, most importantly because we need to have only one visible pvid at any single time, even if it's stale - it must be just one. Right now that rule will not be violated by this change, but people will try using this flag and could see two pvids simultaneously. You can see that the pvid code is even using memory barriers to propagate the new value faster and everywhere the pvid is read only once. That is the reason the flag is set dynamically when dumping entries, too. A second (weaker) argument against would be given the above we don't want another way to do the same thing, specifically if it can provide us with two pvids (e.g. if walking the vlan list) or if it can provide us with a pvid different from the one set in the vg. [Obviously, I'm talking about RCU pvid/vlan use cases similar to the dumps. The locked cases are fine. I would like to avoid explaining why this shouldn't be relied upon without locking] So instead of introducing the above change and making sure of the pvid uniqueness under RCU, simply dynamically populate the pvid flag in br_vlan_get_info(). Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-05net: bridge: move default pvid init/deinit to NETDEV_REGISTER/UNREGISTERNikolay Aleksandrov1-18/+16
Most of the bridge device's vlan init bugs come from the fact that its default pvid is created at the wrong time, way too early in ndo_init() before the device is even assigned an ifindex. It introduces a bug when the bridge's dev_addr is added as fdb during the initial default pvid creation the notification has ifindex/NDA_MASTER both equal to 0 (see example below) which really makes no sense for user-space[0] and is wrong. Usually user-space software would ignore such entries, but they are actually valid and will eventually have all necessary attributes. It makes much more sense to send a notification *after* the device has registered and has a proper ifindex allocated rather than before when there's a chance that the registration might still fail or to receive it with ifindex/NDA_MASTER == 0. Note that we can remove the fdb flush from br_vlan_flush() since that case can no longer happen. At NETDEV_REGISTER br->default_pvid is always == 1 as it's initialized by br_vlan_init() before that and at NETDEV_UNREGISTER it can be anything depending why it was called (if called due to NETDEV_REGISTER error it'll still be == 1, otherwise it could be any value changed during the device life time). For the demonstration below a small change to iproute2 for printing all fdb notifications is added, because it contained a workaround not to show entries with ifindex == 0. Command executed while monitoring: $ ip l add br0 type bridge Before (both ifindex and master == 0): $ bridge monitor fdb 36:7e:8a:b3:56:ba dev * vlan 1 master * permanent After (proper br0 ifindex): $ bridge monitor fdb e6:2a:ae:7a:b7:48 dev br0 vlan 1 master br0 permanent v4: move only the default pvid init/deinit to NETDEV_REGISTER/UNREGISTER v3: send the correct v2 patch with all changes (stub should return 0) v2: on error in br_vlan_init set br->vlgrp to NULL and return 0 in the br_vlan_bridge_event stub when bridge vlans are disabled [0] https://bugzilla.kernel.org/show_bug.cgi?id=204389 Reported-by: michael-dev <michael-dev@fami-braun.de> Fixes: 5be5a2df40f0 ("bridge: Add filtering support for default_pvid") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-29net: bridge: delete local fdb on device init failureNikolay Aleksandrov1-0/+5
On initialization failure we have to delete the local fdb which was inserted due to the default pvid creation. This problem has been present since the inception of default_pvid. Note that currently there are 2 cases: 1) in br_dev_init() when br_multicast_init() fails 2) if register_netdevice() fails after calling ndo_init() This patch takes care of both since br_vlan_flush() is called on both occasions. Also the new fdb delete would be a no-op on normal bridge device destruction since the local fdb would've been already flushed by br_dev_delete(). This is not an issue for ports since nbp_vlan_init() is called last when adding a port thus nothing can fail after it. Reported-by: syzbot+88533dc8b582309bf3ee@syzkaller.appspotmail.com Fixes: 5be5a2df40f0 ("bridge: Add filtering support for default_pvid") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-05bridge: add br_vlan_get_proto()wenxu1-0/+10
This new function allows you to fetch the bridge port vlan protocol. Signed-off-by: wenxu <wenxu@ucloud.cn> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-07-05bridge: add br_vlan_get_pvid_rcu()Pablo Neira Ayuso1-4/+15
This new function allows you to fetch bridge pvid from packet path. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
2019-05-21treewide: Add SPDX license identifier for missed filesThomas Gleixner1-0/+1
Add SPDX license identifiers to all files which: - Have no license information of any form - Have EXPORT_.*_SYMBOL_GPL inside which was used in the initial scan/conversion to ignore the file These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-19bridge: update vlan dev link state for bridge netdev changesMike Manning1-3/+47
If vlan bridge binding is enabled, then the link state of a vlan device that is an upper device of the bridge tracks the state of bridge ports that are members of that vlan. But this can only be done when the link state of the bridge is up. If it is down, then the link state of the vlan devices must also be down. This is to maintain existing behavior for when STP is enabled and there are no live ports, in which case the link state for the bridge and any vlan devices is down. Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-19bridge: update vlan dev state when port added to or deleted from vlanMike Manning1-0/+19
If vlan bridge binding is enabled, then the link state of a vlan device that is an upper device of the bridge should track the state of bridge ports that are members of that vlan. So if a bridge port becomes or stops being a member of a vlan, then update the link state of the vlan device if necessary. Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>