Age | Commit message (Collapse) | Author | Files | Lines |
|
commit 000b7287b67555fee39d39fff75229dedde0dcbf upstream.
When an MRD advertisement is received on a bridge port with multicast
snooping enabled, we mark it as a router port automatically, that
includes adding that port to the router port list. The multicast lock
protects that list, but it is not acquired in the MRD advertisement case
leading to a race condition, we need to take it to fix the race.
Cc: stable@vger.kernel.org
Cc: linus.luessing@c0d3.blue
Fixes: 4b3087c7e37f ("bridge: Snoop Multicast Router Advertisements")
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 04bef83a3358946bfc98a5ecebd1b0003d83d882 upstream.
When a PIM hello packet is received on a bridge port with multicast
snooping enabled, we mark it as a router port automatically, that
includes adding that port the router port list. The multicast lock
protects that list, but it is not acquired in the PIM message case
leading to a race condition, we need to take it to fix the race.
Cc: stable@vger.kernel.org
Fixes: 91b02d3d133b ("bridge: mcast: add router port on PIM hello message")
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit fcb34635854a5a5814227628867ea914a9805384 ]
According to the standard IEC 62439-2, the number of transitions needs
to be counted for each transition 'between' ring state open and ring
state closed and not from open state to closed state.
Therefore fix this for both ring and interconnect ring.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit cfc579f9d89af4ada58c69b03bcaa4887840f3b3 upstream.
The egress tunnel code uses dst_clone() and directly sets the result
which is wrong because the entry might have 0 refcnt or be already deleted,
causing number of problems. It also triggers the WARN_ON() in dst_hold()[1]
when a refcnt couldn't be taken. Fix it by using dst_hold_safe() and
checking if a reference was actually taken before setting the dst.
[1] dmesg WARN_ON log and following refcnt errors
WARNING: CPU: 5 PID: 38 at include/net/dst.h:230 br_handle_egress_vlan_tunnel+0x10b/0x134 [bridge]
Modules linked in: 8021q garp mrp bridge stp llc bonding ipv6 virtio_net
CPU: 5 PID: 38 Comm: ksoftirqd/5 Kdump: loaded Tainted: G W 5.13.0-rc3+ #360
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014
RIP: 0010:br_handle_egress_vlan_tunnel+0x10b/0x134 [bridge]
Code: e8 85 bc 01 e1 45 84 f6 74 90 45 31 f6 85 db 48 c7 c7 a0 02 19 a0 41 0f 94 c6 31 c9 31 d2 44 89 f6 e8 64 bc 01 e1 85 db 75 02 <0f> 0b 31 c9 31 d2 44 89 f6 48 c7 c7 70 02 19 a0 e8 4b bc 01 e1 49
RSP: 0018:ffff8881003d39e8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffffa01902a0
RBP: ffff8881040c6700 R08: 0000000000000000 R09: 0000000000000001
R10: 2ce93d0054fe0d00 R11: 54fe0d00000e0000 R12: ffff888109515000
R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000401
FS: 0000000000000000(0000) GS:ffff88822bf40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f42ba70f030 CR3: 0000000109926000 CR4: 00000000000006e0
Call Trace:
br_handle_vlan+0xbc/0xca [bridge]
__br_forward+0x23/0x164 [bridge]
deliver_clone+0x41/0x48 [bridge]
br_handle_frame_finish+0x36f/0x3aa [bridge]
? skb_dst+0x2e/0x38 [bridge]
? br_handle_ingress_vlan_tunnel+0x3e/0x1c8 [bridge]
? br_handle_frame_finish+0x3aa/0x3aa [bridge]
br_handle_frame+0x2c3/0x377 [bridge]
? __skb_pull+0x33/0x51
? vlan_do_receive+0x4f/0x36a
? br_handle_frame_finish+0x3aa/0x3aa [bridge]
__netif_receive_skb_core+0x539/0x7c6
? __list_del_entry_valid+0x16e/0x1c2
__netif_receive_skb_list_core+0x6d/0xd6
netif_receive_skb_list_internal+0x1d9/0x1fa
gro_normal_list+0x22/0x3e
dev_gro_receive+0x55b/0x600
? detach_buf_split+0x58/0x140
napi_gro_receive+0x94/0x12e
virtnet_poll+0x15d/0x315 [virtio_net]
__napi_poll+0x2c/0x1c9
net_rx_action+0xe6/0x1fb
__do_softirq+0x115/0x2d8
run_ksoftirqd+0x18/0x20
smpboot_thread_fn+0x183/0x19c
? smpboot_unregister_percpu_thread+0x66/0x66
kthread+0x10a/0x10f
? kthread_mod_delayed_work+0xb6/0xb6
ret_from_fork+0x22/0x30
---[ end trace 49f61b07f775fd2b ]---
dst_release: dst:00000000c02d677a refcnt:-1
dst_release underflow
Cc: stable@vger.kernel.org
Fixes: 11538d039ac6 ("bridge: vlan dst_metadata hooks in ingress and egress paths")
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 58e2071742e38f29f051b709a5cca014ba51166f upstream.
This patch fixes a tunnel_dst null pointer dereference due to lockless
access in the tunnel egress path. When deleting a vlan tunnel the
tunnel_dst pointer is set to NULL without waiting a grace period (i.e.
while it's still usable) and packets egressing are dereferencing it
without checking. Use READ/WRITE_ONCE to annotate the lockless use of
tunnel_id, use RCU for accessing tunnel_dst and make sure it is read
only once and checked in the egress path. The dst is already properly RCU
protected so we don't need to do anything fancy than to make sure
tunnel_id and tunnel_dst are read only once and checked in the egress path.
Cc: stable@vger.kernel.org
Fixes: 11538d039ac6 ("bridge: vlan dst_metadata hooks in ingress and egress paths")
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
IFF_BRIDGE_PORT bit
[ Upstream commit 59259ff7a81b9eb6213891c6451221e567f8f22f ]
There is a crash in the function br_get_link_af_size_filtered,
as the port_exists(dev) is true and the rx_handler_data of dev is NULL.
But the rx_handler_data of dev is correct saved in vmcore.
The oops looks something like:
...
pc : br_get_link_af_size_filtered+0x28/0x1c8 [bridge]
...
Call trace:
br_get_link_af_size_filtered+0x28/0x1c8 [bridge]
if_nlmsg_size+0x180/0x1b0
rtnl_calcit.isra.12+0xf8/0x148
rtnetlink_rcv_msg+0x334/0x370
netlink_rcv_skb+0x64/0x130
rtnetlink_rcv+0x28/0x38
netlink_unicast+0x1f0/0x250
netlink_sendmsg+0x310/0x378
sock_sendmsg+0x4c/0x70
__sys_sendto+0x120/0x150
__arm64_sys_sendto+0x30/0x40
el0_svc_common+0x78/0x130
el0_svc_handler+0x38/0x78
el0_svc+0x8/0xc
In br_add_if(), we found there is no guarantee that
assigning rx_handler_data to dev->rx_handler_data
will before setting the IFF_BRIDGE_PORT bit of priv_flags.
So there is a possible data competition:
CPU 0: CPU 1:
(RCU read lock) (RTNL lock)
rtnl_calcit() br_add_slave()
if_nlmsg_size() br_add_if()
br_get_link_af_size_filtered() -> netdev_rx_handler_register
...
// The order is not guaranteed
... -> dev->priv_flags |= IFF_BRIDGE_PORT;
// The IFF_BRIDGE_PORT bit of priv_flags has been set
-> if (br_port_exists(dev)) {
// The dev->rx_handler_data has NOT been assigned
-> p = br_port_get_rcu(dev);
....
-> rcu_assign_pointer(dev->rx_handler_data, rx_handler_data);
...
Fix it in br_get_link_af_size_filtered, using br_port_get_check_rcu() and checking the return value.
Signed-off-by: Zhang Zhengming <zhangzhengming@huawei.com>
Reviewed-by: Zhao Lei <zhaolei69@huawei.com>
Reviewed-by: Wang Xiaogang <wangxiaogang3@huawei.com>
Suggested-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 68f5c12abbc9b6f8c5eea16c62f8b7be70793163 upstream.
When CONFIG_NET_SWITCHDEV is disabled, the shim for switchdev_port_attr_set
inside br_mc_disabled_update returns -EOPNOTSUPP. This is not caught,
and propagated to the caller of br_multicast_add_port, preventing ports
from joining the bridge.
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Fixes: ae1ea84b33da ("net: bridge: propagate error code and extack from br_mc_disabled_update")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit ae1ea84b33dab45c7b6c1754231ebda5959b504c ]
Some Ethernet switches might only be able to support disabling multicast
snooping globally, which is an issue for example when several bridges
span the same physical device and request contradictory settings.
Propagate the return value of br_mc_disabled_update() such that this
limitation is transmitted correctly to user-space.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 0353b4a96b7a9f60fe20d1b3ebd4931a4085f91c ]
Recently we had an interop issue where RARP packets got suppressed with
bridge neigh suppression enabled, but the check in the code was meant to
suppress GARP. Exclude RARP packets from it which would allow some VMWare
setups to work, to quote the report:
"Those RARP packets usually get generated by vMware to notify physical
switches when vMotion occurs. vMware may use random sip/tip or just use
sip=tip=0. So the RARP packet sometimes get properly flooded by the vtep
and other times get dropped by the logic"
Reported-by: Amer Abdalamer <amer@nvidia.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 99014088156cd78867d19514a0bc771c4b86b93b ]
The IPv6 Multicast Router Advertisements parsing has the following two
issues:
For one thing, ICMPv6 MRD Advertisements are smaller than ICMPv6 MLD
messages (ICMPv6 MRD Adv.: 8 bytes vs. ICMPv6 MLDv1/2: >= 24 bytes,
assuming MLDv2 Reports with at least one multicast address entry).
When ipv6_mc_check_mld_msg() tries to parse an Multicast Router
Advertisement its MLD length check will fail - and it will wrongly
return -EINVAL, even if we have a valid MRD Advertisement. With the
returned -EINVAL the bridge code will assume a broken packet and will
wrongly discard it, potentially leading to multicast packet loss towards
multicast routers.
The second issue is the MRD header parsing in
br_ip6_multicast_mrd_rcv(): It wrongly checks for an ICMPv6 header
immediately after the IPv6 header (IPv6 next header type). However
according to RFC4286, section 2 all MRD messages contain a Router Alert
option (just like MLD). So instead there is an IPv6 Hop-by-Hop option
for the Router Alert between the IPv6 and ICMPv6 header, again leading
to the bridge wrongly discarding Multicast Router Advertisements.
To fix these two issues, introduce a new return value -ENODATA to
ipv6_mc_check_mld() to indicate a valid ICMPv6 packet with a hop-by-hop
option which is not an MLD but potentially an MRD packet. This also
simplifies further parsing in the bridge code, as ipv6_mc_check_mld()
already fully checks the ICMPv6 header and hop-by-hop option.
These issues were found and fixed with the help of the mrdisc tool
(https://github.com/troglobit/mrdisc).
Fixes: 4b3087c7e37f ("bridge: Snoop Multicast Router Advertisements")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
Just like ip/ip6/arptables, the hooks have to be removed, then
synchronize_rcu() has to be called to make sure no more packets are being
processed before the ruleset data is released.
Place the hook unregistration in the pre_exit hook, then call the new
ebtables pre_exit function from there.
Years ago, when first netns support got added for netfilter+ebtables,
this used an older (now removed) netfilter hook unregister API, that did
a unconditional synchronize_rcu().
Now that all is done with call_rcu, ebtable_{filter,nat,broute} pernet exit
handlers may free the ebtable ruleset while packets are still in flight.
This can only happens on module removal, not during netns exit.
The new function expects the table name, not the table struct.
This is because upcoming patch set (targeting -next) will remove all
net->xt.{nat,filter,broute}_table instances, this makes it necessary
to avoid external references to those member variables.
The existing APIs will be converted, so follow the upcoming scheme of
passing name + hook type instead.
Fixes: aee12a0a3727e ("ebtables: remove nf_hook_register usage")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
As explained in this discussion:
https://lore.kernel.org/netdev/20210117193009.io3nungdwuzmo5f7@skbuf/
the switchdev notifiers for FDB entries managed to have a zero-day bug.
The bridge would not say that this entry is local:
ip link add br0 type bridge
ip link set swp0 master br0
bridge fdb add dev swp0 00:01:02:03:04:05 master local
and the switchdev driver would be more than happy to offload it as a
normal static FDB entry. This is despite the fact that 'local' and
non-'local' entries have completely opposite directions: a local entry
is locally terminated and not forwarded, whereas a static entry is
forwarded and not locally terminated. So, for example, DSA would install
this entry on swp0 instead of installing it on the CPU port as it should.
There is an even sadder part, which is that the 'local' flag is implicit
if 'static' is not specified, meaning that this command produces the
same result of adding a 'local' entry:
bridge fdb add dev swp0 00:01:02:03:04:05 master
I've updated the man pages for 'bridge', and after reading it now, it
should be pretty clear to any user that the commands above were broken
and should have never resulted in the 00:01:02:03:04:05 address being
forwarded (this behavior is coherent with non-switchdev interfaces):
https://patchwork.kernel.org/project/netdevbpf/cover/20210211104502.2081443-1-olteanv@gmail.com/
If you're a user reading this and this is what you want, just use:
bridge fdb add dev swp0 00:01:02:03:04:05 master static
Because switchdev should have given drivers the means from day one to
classify FDB entries as local/non-local, but didn't, it means that all
drivers are currently broken. So we can just as well omit the switchdev
notifications for local FDB entries, which is exactly what this patch
does to close the bug in stable trees. For further development work
where drivers might want to trap the local FDB entries to the host, we
can add a 'bool is_local' to br_switchdev_fdb_call_notifiers(), and
selectively make drivers act upon that bit, while all the others ignore
those entries if the 'is_local' bit is set.
Fixes: 6b26b51b1d13 ("net: bridge: Add support for notifying devices about FDB add/del")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Check the return values of the br_mrp_switchdev function.
In case of:
- BR_MRP_NONE, return the error to userspace,
- BR_MRP_SW, continue with SW implementation,
- BR_MRP_HW, continue without SW implementation,
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch extends the br_mrp_switchdev functions to be able to have a
better understanding what cause the issue and if the SW needs to be used
as a backup.
There are the following cases:
- when the code is compiled without CONFIG_NET_SWITCHDEV. In this case
return success so the SW can continue with the protocol. Depending
on the function, it returns 0 or BR_MRP_SW.
- when code is compiled with CONFIG_NET_SWITCHDEV and the driver doesn't
implement any MRP callbacks. In this case the HW can't run MRP so it
just returns -EOPNOTSUPP. So the SW will stop further to configure the
node.
- when code is compiled with CONFIG_NET_SWITCHDEV and the driver fully
supports any MRP functionality. In this case the SW doesn't need to do
anything. The functions will return 0 or BR_MRP_HW.
- when code is compiled with CONFIG_NET_SWITCHDEV and the HW can't run
completely the protocol but it can help the SW to run it. For
example, the HW can't support completely MRM role(can't detect when it
stops receiving MRP Test frames) but it can redirect these frames to
CPU. In this case it is possible to have a SW fallback. The SW will
try initially to call the driver with sw_backup set to false, meaning
that the HW should implement completely the role. If the driver returns
-EOPNOTSUPP, the SW will try again with sw_backup set to false,
meaning that the SW will detect when it stops receiving the frames but
it needs HW support to redirect the frames to CPU. In case the driver
returns 0 then the SW will continue to configure the node accordingly.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add the enum br_mrp_hw_support that is used by the br_mrp_switchdev
functions to allow the SW to detect the cases where HW can't implement
the functionality or when SW is used as a backup.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The prototype of br_vlan_filter_toggle was updated to include a netlink
extack, but the stub definition wasn't, which results in a build error
when CONFIG_BRIDGE_VLAN_FILTERING=n.
Fixes: 9e781401cbfc ("net: bridge: propagate extack through store_bridge_parm")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The benefit is the ability to propagate errors from switchdev drivers
for the SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING and
SWITCHDEV_ATTR_ID_BRIDGE_VLAN_PROTOCOL attributes.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The bridge sysfs interface stores parameters for the STP, VLAN,
multicast etc subsystems using a predefined function prototype.
Sometimes the underlying function being called supports a netlink
extended ack message, and we ignore it.
Let's expand the store_bridge_parm function prototype to include the
extack, and just print it to console, but at least propagate it where
applicable. Where not applicable, create a shim function in the
br_sysfs_br.c file that discards the extra function argument.
This patch allows us to propagate the extack argument to
br_vlan_set_default_pvid, br_vlan_set_proto and br_vlan_filter_toggle,
and from there, further up in br_changelink from br_netlink.c.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This function is identical with br_vlan_filter_toggle.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This switchdev attribute offers a counterproductive API for a driver
writer, because although br_switchdev_set_port_flag gets passed a
"flags" and a "mask", those are passed piecemeal to the driver, so while
the PRE_BRIDGE_FLAGS listener knows what changed because it has the
"mask", the BRIDGE_FLAGS listener doesn't, because it only has the final
value. But certain drivers can offload only certain combinations of
settings, like for example they cannot change unicast flooding
independently of multicast flooding - they must be both on or both off.
The way the information is passed to switchdev makes drivers not
expressive enough, and unable to reject this request ahead of time, in
the PRE_BRIDGE_FLAGS notifier, so they are forced to reject it during
the deferred BRIDGE_FLAGS attribute, where the rejection is currently
ignored.
This patch also changes drivers to make use of the "mask" field for edge
detection when possible.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For the netlink interface, propagate errors through extack rather than
simply printing them to the console. For the sysfs interface, we still
print to the console, but at least that's one layer higher than in
switchdev, which also allows us to silently ignore the offloading of
flags if that is ever needed in the future.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If for example this command:
ip link set swp0 type bridge_slave flood off mcast_flood off learning off
succeeded at configuring BR_FLOOD and BR_MCAST_FLOOD but not at
BR_LEARNING, there would be no attempt to revert the partial state in
any way. Arguably, if the user changes more than one flag through the
same netlink command, this one _should_ be all or nothing, which means
it should be passed through switchdev as all or nothing.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
The function br_mrp_port_switchdev_set_state was called both with MRP
port state and STP port state, which is an issue because they don't
match exactly.
Therefore, update the function to be used only with STP port state and
use the id SWITCHDEV_ATTR_ID_PORT_STP_STATE.
The choice of using STP over MRP is that the drivers already implement
SWITCHDEV_ATTR_ID_PORT_STP_STATE and already in SW we update the port
STP state.
Fixes: 9a9f26e8f7ea30 ("bridge: mrp: Connect MRP API with the switchdev API")
Fixes: fadd409136f0f2 ("bridge: switchdev: mrp: Implement MRP API for switchdev")
Fixes: 2f1a11ae11d222 ("bridge: mrp: Add MRP interface.")
Reported-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Looking through patchwork I don't see that there was any consensus to
use switchdev notifiers only in case of netlink provided port flags but
not sysfs (as a sort of deprecation, punishment or anything like that),
so we should probably keep the user interface consistent in terms of
functionality.
http://patchwork.ozlabs.org/project/netdev/patch/20170605092043.3523-3-jiri@resnulli.us/
http://patchwork.ozlabs.org/project/netdev/patch/20170608064428.4785-3-jiri@resnulli.us/
Fixes: 3922285d96e7 ("net: bridge: Add support for offloading port attributes")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
1) Remove indirection and use nf_ct_get() instead from nfnetlink_log
and nfnetlink_queue, from Florian Westphal.
2) Add weighted random twos choice least-connection scheduling for IPVS,
from Darby Payne.
3) Add a __hash placeholder in the flow tuple structure to identify
the field to be included in the rhashtable key hash calculation.
4) Add a new nft_parse_register_load() and nft_parse_register_store()
to consolidate register load and store in the core.
5) Statify nft_parse_register() since it has no more module clients.
6) Remove redundant assignment in nft_cmp, from Colin Ian King.
* git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next:
netfilter: nftables: remove redundant assignment of variable err
netfilter: nftables: statify nft_parse_register()
netfilter: nftables: add nft_parse_register_store() and use it
netfilter: nftables: add nft_parse_register_load() and use it
netfilter: flowtable: add hash offset field to tuple
ipvs: add weighted random twos choice algorithm
netfilter: ctnetlink: remove get_ct indirection
====================
Link: https://lore.kernel.org/r/20210206015005.23037-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use ERR_CAST inlined function instead of ERR_PTR(PTR_ERR(...)).
net/bridge/br_multicast.c:1246:9-16: WARNING: ERR_CAST can be used with mp
Generated by: scripts/coccinelle/api/err_cast.cocci
Signed-off-by: Xu Wang <vulab@iscas.ac.cn>
Link: https://lore.kernel.org/r/20210204070549.83636-1-vulab@iscas.ac.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We're moving to netlink-only options, so add comments in the bridge's
sysfs files to warn against adding any new sysfs entries.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We decided to stop adding new sysfs bridge options and continue with
netlink only, so remove hosts limit sysfs support.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
drivers/net/can/dev.c
b552766c872f ("can: dev: prevent potential information leak in can_fill_info()")
3e77f70e7345 ("can: dev: move driver related infrastructure into separate subdir")
0a042c6ec991 ("can: dev: move netlink related code into seperate file")
Code move.
drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
57ac4a31c483 ("net/mlx5e: Correctly handle changing the number of queues when the interface is down")
214baf22870c ("net/mlx5e: Support HTB offload")
Adjacent code changes
net/switchdev/switchdev.c
20776b465c0c ("net: switchdev: don't set port_obj_info->handled true when -EOPNOTSUPP")
ffb68fc58e96 ("net: switchdev: remove the transaction structure from port object notifiers")
bae33f2b5afe ("net: switchdev: remove the transaction structure from port attributes")
Transaction parameter gets dropped otherwise keep the fix.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add two new port attributes which make EHT hosts limit configurable and
export the current number of tracked EHT hosts:
- IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT: configure/retrieve current limit
- IFLA_BRPORT_MCAST_EHT_HOSTS_CNT: current number of tracked hosts
Setting IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT to 0 is currently not allowed.
Note that we have to increase RTNL_SLAVE_MAX_TYPE to 38 minimum, I've
increased it to 40 to have space for two more future entries.
v2: move br_multicast_eht_set_hosts_limit() to br_multicast_eht.c,
no functional change
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a default limit of 512 for number of tracked EHT hosts per-port.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This new function combines the netlink register attribute parser
and the store validation function.
This update requires to replace:
enum nft_registers dreg:8;
in many of the expression private areas otherwise compiler complains
with:
error: cannot take address of bit-field ‘dreg’
when passing the register field as reference.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Fix the messed up indentation in br_multicast_eht_set_entry_lookup().
Fixes: baa74d39ca39 ("net: bridge: multicast: add EHT source set handling functions")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Link: https://lore.kernel.org/r/20210125082040.13022-1-razor@blackwall.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
coccicheck suggested using PTR_ERR_OR_ZERO() and looking at the code.
Fix the following coccicheck warnings:
./net/bridge/br_multicast.c:1295:7-13: WARNING: PTR_ERR_OR_ZERO can be
used.
Reported-by: Abaci <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Zhong <abaci-bugfix@linux.alibaba.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Link: https://lore.kernel.org/r/1611542381-91178-1-git-send-email-abaci-bugfix@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
None of these are actually used in the kernel/userspace interface -
there's a userspace component of implementing MRP, and userspace will
need to construct certain frames to put on the wire, but there's no
reason the kernel should provide the relevant definitions in a UAPI
header.
In fact, some of those definitions were broken until previous commit,
so only keep the few that are actually referenced in the kernel code,
and move them to the br_private_mrp.h header.
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Mark groups which were deleted due to fast leave/EHT.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
A block report can result in empty source and host sets for both include
and exclude groups so if there are no hosts left we can safely remove
the group. Pull the block group handling so it can cover both cases and
add a check if EHT requires the delete.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We should be able to handle host filter mode changing. For exclude mode
we must create a zero-src entry so the group will be kept even without
any S,G entries (non-zero source sets). That entry doesn't count to the
entry limit and can always be created, its timer is refreshed on new
exclude reports and if we change the host filter mode to include then it
gets removed and we rely only on the non-zero source sets.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This is an optimization specifically for TO_INCLUDE which sends queries
for the older entries and thus lowers the S,G timers to LMQT. If we have
the following situation for a group in either include or exclude mode:
- host A was interested in srcs X and Y, but is timing out
- host B sends TO_INCLUDE src Z, the bridge lowers X and Y's timeouts
to LMQT
- host B sends BLOCK src Z after LMQT time has passed
=> since host B is the last host we can delete the group, but if we
still have host A's EHT entries for X and Y (i.e. if they weren't
lowered to LMQT previously) then we'll have to wait another LMQT
time before deleting the group, with this optimization we can
directly remove it regardless of the group mode as there are no more
interested hosts
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add support for IGMPv3/MLDv2 include and exclude EHT handling. Similar to
how the reports are processed we have 2 cases when the group is in include
or exclude mode, these are processed as follows:
- group include
- is_include: create missing entries
- to_include: flush existing entries and create a new set from the
report, obviously if the src set is empty then we delete the group
- group exclude
- is_exclude: create missing entries
- to_exclude: flush existing entries and create a new set from the
report, any empty source set entries are removed
If the group is in a different mode then we just flush all entries reported
by the host and we create a new set with the new mode entries created from
the report. If the report is include type, the source list is empty and
the group has empty sources' set then we remove it. Any source set entries
which are empty are removed as well. If the group is in exclude mode it
can exist without any S,G entries (allowing for all traffic to pass).
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add support for IGMPv3/MLDv2 allow/block EHT handling. Similar to how
the reports are processed we have 2 cases when the group is in include
or exclude mode, these are processed as follows:
- group include
- allow: create missing entries
- block: remove existing matching entries and remove the corresponding
S,G entries if there are no more set host entries, then possibly
delete the whole group if there are no more S,G entries
- group exclude
- allow
- host include: create missing entries
- host exclude: remove existing matching entries and remove the
corresponding S,G entries if there are no more set host entries
- block
- host include: remove existing matching entries and remove the
corresponding S,G entries if there are no more set host entries,
then possibly delete the whole group if there are no more S,G entries
- host exclude: create missing entries
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Now that we can delete set entries, we can use that to remove EHT hosts.
Since the group's host set entries exist only when there are related
source set entries we just have to flush all source set entries
joined by the host set entry and it will be automatically removed.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add EHT source set and set-entry create, delete and lookup functions.
These allow to manipulate source sets which contain their own host sets
with entries which joined that S,G. We're limiting the maximum number of
tracked S,G entries per host to PG_SRC_ENT_LIMIT (currently 32) which is
the current maximum of S,G entries for a group. There's a per-set timer
which will be used to destroy the whole set later.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add functions to create, destroy and lookup an EHT host. These are
per-host entries contained in the eht_host_tree in net_bridge_port_group
which are used to store a list of all sources (S,G) entries joined for that
group by each host, the host's current filter mode and total number of
joined entries.
No functional changes yet, these would be used in later patches.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add EHT structures for tracking hosts and sources per group. We keep one
set for each host which has all of the host's S,G entries, and one set for
each multicast source which has all hosts that have joined that S,G. For
each host, source entry we record the filter_mode and we keep an expiry
timer. There is also one global expiry timer per source set, it is
updated with each set entry update, it will be later used to lower the
set's timer instead of lowering each entry's timer separately.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We need to preserve the srcs pointer since we'll be passing it for EHT
handling later.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Prepare __grp_src_block_incl() for being able to cause a notification
due to changes. Currently it cannot happen, but EHT would change that
since we'll be deleting sources immediately. Make sure that if the pg is
deleted we don't return true as that would cause the caller to access
freed pg. This patch shouldn't cause any functional change.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We need to pass the host address so later it can be used for explicit
host tracking. No functional change.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Rename src_size argument to addr_size in preparation for passing host
address as an argument to IGMPv3/MLDv2 functions.
No functional change.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|