<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/include/net/switchdev.h, branch v6.6.132</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.132</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.132'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2024-03-01T12:35:06+00:00</updated>
<entry>
<title>net: bridge: switchdev: Skip MDB replays of deferred events on offload</title>
<updated>2024-03-01T12:35:06+00:00</updated>
<author>
<name>Tobias Waldekranz</name>
<email>tobias@waldekranz.com</email>
</author>
<published>2024-02-14T21:40:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=603be95437e7fd85ba694e75918067fb9e7754db'/>
<id>urn:sha1:603be95437e7fd85ba694e75918067fb9e7754db</id>
<content type='text'>
[ Upstream commit dc489f86257cab5056e747344f17a164f63bff4b ]

Before this change, generation of the list of MDB events to replay
would race against the creation of new group memberships, either from
the IGMP/MLD snooping logic or from user configuration.

While new memberships are immediately visible to walkers of
br-&gt;mdb_list, the notification of their existence to switchdev event
subscribers is deferred until a later point in time. So if a replay
list was generated during a time that overlapped with such a window,
it would also contain a replay of the not-yet-delivered event.

The driver would thus receive two copies of what the bridge internally
considered to be one single event. On destruction of the bridge, only
a single membership deletion event was therefore sent. As a
consequence of this, drivers which reference count memberships (at
least DSA), would be left with orphan groups in their hardware
database when the bridge was destroyed.

This is only an issue when replaying additions. While deletion events
may still be pending on the deferred queue, they will already have
been removed from br-&gt;mdb_list, so no duplicates can be generated in
that scenario.

To a user this meant that old group memberships, from a bridge in
which a port was previously attached, could be reanimated (in
hardware) when the port joined a new bridge, without the new bridge's
knowledge.

For example, on an mv88e6xxx system, create a snooping bridge and
immediately add a port to it:

    root@infix-06-0b-00:~$ ip link add dev br0 up type bridge mcast_snooping 1 &amp;&amp; \
    &gt; ip link set dev x3 up master br0

And then destroy the bridge:

    root@infix-06-0b-00:~$ ip link del dev br0
    root@infix-06-0b-00:~$ mvls atu
    ADDRESS             FID  STATE      Q  F  0  1  2  3  4  5  6  7  8  9  a
    DEV:0 Marvell 88E6393X
    33:33:00:00:00:6a     1  static     -  -  0  .  .  .  .  .  .  .  .  .  .
    33:33:ff:87:e4:3f     1  static     -  -  0  .  .  .  .  .  .  .  .  .  .
    ff:ff:ff:ff:ff:ff     1  static     -  -  0  1  2  3  4  5  6  7  8  9  a
    root@infix-06-0b-00:~$

The two IPv6 groups remain in the hardware database because the
port (x3) is notified of the host's membership twice: once via the
original event and once via a replay. Since only a single delete
notification is sent, the count remains at 1 when the bridge is
destroyed.

Then add the same port (or another port belonging to the same hardware
domain) to a new bridge, this time with snooping disabled:

    root@infix-06-0b-00:~$ ip link add dev br1 up type bridge mcast_snooping 0 &amp;&amp; \
    &gt; ip link set dev x3 up master br1

All multicast, including the two IPv6 groups from br0, should now be
flooded, according to the policy of br1. But instead the old
memberships are still active in the hardware database, causing the
switch to only forward traffic to those groups towards the CPU (port
0).

Eliminate the race in two steps:

1. Grab the write-side lock of the MDB while generating the replay
   list.

This prevents new memberships from showing up while we are generating
the replay list. But it leaves the scenario in which a deferred event
was already generated, but not delivered, before we grabbed the
lock. Therefore:

2. Make sure that no deferred version of a replay event is already
   enqueued to the switchdev deferred queue, before adding it to the
   replay list, when replaying additions.

Fixes: 4f2673b3a2b6 ("net: bridge: add helper to replay port and host-joined mdb entries")
Signed-off-by: Tobias Waldekranz &lt;tobias@waldekranz.com&gt;
Reviewed-by: Vladimir Oltean &lt;olteanv@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: switchdev: Remove unused declaration switchdev_port_fwd_mark_set()</title>
<updated>2023-08-09T20:12:15+00:00</updated>
<author>
<name>Yue Haibing</name>
<email>yuehaibing@huawei.com</email>
</author>
<published>2023-08-08T14:59:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a76728719c85eaa8440c353490c639655d60d28f'/>
<id>urn:sha1:a76728719c85eaa8440c353490c639655d60d28f</id>
<content type='text'>
Commit 6bc506b4fb06 ("bridge: switchdev: Add forward mark support for stacked devices")
removed the implementation but leave declaration.

Signed-off-by: Yue Haibing &lt;yuehaibing@huawei.com&gt;
Reviewed-by: Simon Horman &lt;horms@kernel.org&gt;
Reviewed-by: Petr Machata &lt;petrm@nvidia.com&gt;
Link: https://lore.kernel.org/r/20230808145955.2176-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: switchdev: Remove unused typedef switchdev_obj_dump_cb_t()</title>
<updated>2023-08-02T19:28:26+00:00</updated>
<author>
<name>Yue Haibing</name>
<email>yuehaibing@huawei.com</email>
</author>
<published>2023-08-01T14:42:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f85b1c7da776d0cb2b4509bdd7f406fe5607930b'/>
<id>urn:sha1:f85b1c7da776d0cb2b4509bdd7f406fe5607930b</id>
<content type='text'>
Commit 29ab586c3d83 ("net: switchdev: Remove bridge bypass support from switchdev")
leave this unused.

Signed-off-by: Yue Haibing &lt;yuehaibing@huawei.com&gt;
Reviewed-by: Simon Horman &lt;horms@kernel.org&gt;
Link: https://lore.kernel.org/r/20230801144209.27512-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: switchdev: Add a helper to replay objects on a bridge port</title>
<updated>2023-07-21T07:54:03+00:00</updated>
<author>
<name>Petr Machata</name>
<email>petrm@nvidia.com</email>
</author>
<published>2023-07-19T11:01:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f2e2857b352277a451e2f91409e461fa7ebf2d15'/>
<id>urn:sha1:f2e2857b352277a451e2f91409e461fa7ebf2d15</id>
<content type='text'>
When a front panel joins a bridge via another netdevice (typically a LAG),
the driver needs to learn about the objects configured on the bridge port.
When the bridge port is offloaded by the driver for the first time, this
can be achieved by passing a notifier to switchdev_bridge_port_offload().
The notifier is then invoked for the individual objects (such as VLANs)
configured on the bridge, and can look for the interesting ones.

Calling switchdev_bridge_port_offload() when the second port joins the
bridge lower is unnecessary, but the replay is still needed. To that end,
add a new function, switchdev_bridge_port_replay(), which does only the
replay part of the _offload() function in exactly the same way as that
function.

Cc: Jiri Pirko &lt;jiri@resnulli.us&gt;
Cc: Ivan Vecera &lt;ivecera@redhat.com&gt;
Cc: Roopa Prabhu &lt;roopa@nvidia.com&gt;
Cc: Nikolay Aleksandrov &lt;razor@blackwall.org&gt;
Cc: bridge@lists.linux-foundation.org
Signed-off-by: Petr Machata &lt;petrm@nvidia.com&gt;
Reviewed-by: Danielle Ratson &lt;danieller@nvidia.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bridge: switchdev: Allow device drivers to install locked FDB entries</title>
<updated>2022-11-10T03:06:13+00:00</updated>
<author>
<name>Hans J. Schultz</name>
<email>netdev@kapio-technology.com</email>
</author>
<published>2022-11-08T10:47:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=27fabd02abf30a9df9899f92d467591c7eabb1ba'/>
<id>urn:sha1:27fabd02abf30a9df9899f92d467591c7eabb1ba</id>
<content type='text'>
When the bridge is offloaded to hardware, FDB entries are learned and
aged-out by the hardware. Some device drivers synchronize the hardware
and software FDBs by generating switchdev events towards the bridge.

When a port is locked, the hardware must not learn autonomously, as
otherwise any host will blindly gain authorization. Instead, the
hardware should generate events regarding hosts that are trying to gain
authorization and their MAC addresses should be notified by the device
driver as locked FDB entries towards the bridge driver.

Allow device drivers to notify the bridge driver about such entries by
extending the 'switchdev_notifier_fdb_info' structure with the 'locked'
bit. The bit can only be set by device drivers and not by the bridge
driver.

Prevent a locked entry from being installed if MAB is not enabled on the
bridge port.

If an entry already exists in the bridge driver, reject the locked entry
if the current entry does not have the "locked" flag set or if it points
to a different port. The same semantics are implemented in the software
data path.

Signed-off-by: Hans J. Schultz &lt;netdev@kapio-technology.com&gt;
Signed-off-by: Ido Schimmel &lt;idosch@nvidia.com&gt;
Reviewed-by: Petr Machata &lt;petrm@nvidia.com&gt;
Signed-off-by: Petr Machata &lt;petrm@nvidia.com&gt;
Reviewed-by: Vladimir Oltean &lt;vladimir.oltean@nxp.com&gt;
Acked-by: Nikolay Aleksandrov &lt;razor@blackwall.org&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: switchdev: add reminder near struct switchdev_notifier_fdb_info</title>
<updated>2022-06-30T03:37:36+00:00</updated>
<author>
<name>Vladimir Oltean</name>
<email>vladimir.oltean@nxp.com</email>
</author>
<published>2022-06-28T10:08:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3eb4a4c3442c0642feaf466ecf6fe3cfb4af2c43'/>
<id>urn:sha1:3eb4a4c3442c0642feaf466ecf6fe3cfb4af2c43</id>
<content type='text'>
br_switchdev_fdb_notify() creates an on-stack FDB info variable, and
initializes it member by member. As such, newly added fields which are
not initialized by br_switchdev_fdb_notify() will contain junk bytes
from the stack.

Other uses of struct switchdev_notifier_fdb_info have a struct
initializer which should put zeroes in the uninitialized fields.

Add a reminder above the structure for future developers. Recently
discussed during review.

Link: https://patchwork.kernel.org/project/netdevbpf/patch/20220524152144.40527-2-schultz.hans+netdev@gmail.com/#24877698
Link: https://patchwork.kernel.org/project/netdevbpf/patch/20220524152144.40527-3-schultz.hans+netdev@gmail.com/#24912269
Signed-off-by: Vladimir Oltean &lt;vladimir.oltean@nxp.com&gt;
Reviewed-by: Ido Schimmel &lt;idosch@nvidia.com&gt;
Reviewed-by: Nikolay Aleksandrov &lt;razor@blackwall.org&gt;
Link: https://lore.kernel.org/r/20220628100831.2899434-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: bridge: mst: Notify switchdev drivers of MST state changes</title>
<updated>2022-03-17T23:49:58+00:00</updated>
<author>
<name>Tobias Waldekranz</name>
<email>tobias@waldekranz.com</email>
</author>
<published>2022-03-16T15:08:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7ae9147f4312903b97eae231c48571bbd95dc63f'/>
<id>urn:sha1:7ae9147f4312903b97eae231c48571bbd95dc63f</id>
<content type='text'>
Generate a switchdev notification whenever an MST state changes. This
notification is keyed by the VLANs MSTI rather than the VID, since
multiple VLANs may share the same MST instance.

Signed-off-by: Tobias Waldekranz &lt;tobias@waldekranz.com&gt;
Acked-by: Nikolay Aleksandrov &lt;razor@blackwall.org&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: bridge: mst: Notify switchdev drivers of VLAN MSTI migrations</title>
<updated>2022-03-17T23:49:58+00:00</updated>
<author>
<name>Tobias Waldekranz</name>
<email>tobias@waldekranz.com</email>
</author>
<published>2022-03-16T15:08:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6284c723d9b9995cc27ab3c6368a9d95d67111ff'/>
<id>urn:sha1:6284c723d9b9995cc27ab3c6368a9d95d67111ff</id>
<content type='text'>
Whenever a VLAN moves to a new MSTI, send a switchdev notification so
that switchdevs can track a bridge's VID to MSTI mappings.

Signed-off-by: Tobias Waldekranz &lt;tobias@waldekranz.com&gt;
Acked-by: Nikolay Aleksandrov &lt;razor@blackwall.org&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: bridge: mst: Notify switchdev drivers of MST mode changes</title>
<updated>2022-03-17T23:49:57+00:00</updated>
<author>
<name>Tobias Waldekranz</name>
<email>tobias@waldekranz.com</email>
</author>
<published>2022-03-16T15:08:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=87c167bb94ee3fd49569d4aa2038b9b8840d906f'/>
<id>urn:sha1:87c167bb94ee3fd49569d4aa2038b9b8840d906f</id>
<content type='text'>
Trigger a switchdev event whenever the bridge's MST mode is
enabled/disabled. This allows constituent ports to either perform any
required hardware config, or refuse the change if it not supported.

Signed-off-by: Tobias Waldekranz &lt;tobias@waldekranz.com&gt;
Acked-by: Nikolay Aleksandrov &lt;razor@blackwall.org&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: switchdev: remove lag_mod_cb from switchdev_handle_fdb_event_to_device</title>
<updated>2022-02-25T05:31:43+00:00</updated>
<author>
<name>Vladimir Oltean</name>
<email>vladimir.oltean@nxp.com</email>
</author>
<published>2022-02-23T14:00:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ec638740fce990ad2b9af43ead8088d6d6eb2145'/>
<id>urn:sha1:ec638740fce990ad2b9af43ead8088d6d6eb2145</id>
<content type='text'>
When the switchdev_handle_fdb_event_to_device() event replication helper
was created, my original thought was that FDB events on LAG interfaces
should most likely be special-cased, not just replicated towards all
switchdev ports beneath that LAG. So this replication helper currently
does not recurse through switchdev lower interfaces of LAG bridge ports,
but rather calls the lag_mod_cb() if that was provided.

No switchdev driver uses this helper for FDB events on LAG interfaces
yet, so that was an assumption which was yet to be tested. It is
certainly usable for that purpose, as my RFC series shows:

https://patchwork.kernel.org/project/netdevbpf/cover/20220210125201.2859463-1-vladimir.oltean@nxp.com/

however this approach is slightly convoluted because:

- the switchdev driver gets a "dev" that isn't its own net device, but
  rather the LAG net device. It must call switchdev_lower_dev_find(dev)
  in order to get a handle of any of its own net devices (the ones that
  pass check_cb).

- in order for FDB entries on LAG ports to be correctly refcounted per
  the number of switchdev ports beneath that LAG, we haven't escaped the
  need to iterate through the LAG's lower interfaces. Except that is now
  the responsibility of the switchdev driver, because the replication
  helper just stopped half-way.

So, even though yes, FDB events on LAG bridge ports must be
special-cased, in the end it's simpler to let switchdev_handle_fdb_*
just iterate through the LAG port's switchdev lowers, and let the
switchdev driver figure out that those physical ports are under a LAG.

The switchdev_handle_fdb_event_to_device() helper takes a
"foreign_dev_check" callback so it can figure out whether @dev can
autonomously forward to @foreign_dev. DSA fills this method properly:
if the LAG is offloaded by another port in the same tree as @dev, then
it isn't foreign. If it is a software LAG, it is foreign - forwarding
happens in software.

Whether an interface is foreign or not decides whether the replication
helper will go through the LAG's switchdev lowers or not. Since the
lan966x doesn't properly fill this out, FDB events on software LAG
uppers will get called. By changing lan966x_foreign_dev_check(), we can
suppress them.

Whereas DSA will now start receiving FDB events for its offloaded LAG
uppers, so we need to return -EOPNOTSUPP, since we currently don't do
the right thing for them.

Cc: Horatiu Vultur &lt;horatiu.vultur@microchip.com&gt;
Signed-off-by: Vladimir Oltean &lt;vladimir.oltean@nxp.com&gt;
Reviewed-by: Florian Fainelli &lt;f.fainelli@gmail.com&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
</feed>
