summaryrefslogtreecommitdiff
path: root/net/bridge
AgeCommit message (Collapse)AuthorFilesLines
2022-05-09rtnetlink: add extack support in fdb del handlersAlaa Mohamed2-2/+4
Add extack support to .ndo_fdb_del in netdevice.h and all related methods. Signed-off-by: Alaa Mohamed <eng.alaamohamedsoliman.am@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-06net: make drivers set the TSO limit not the GSO limitJakub Kicinski1-6/+6
Drivers should call the TSO setting helper, GSO is controllable by user space. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-0/+2
include/linux/netdevice.h net/core/dev.c 6510ea973d8d ("net: Use this_cpu_inc() to increment net->core_stats") 794c24e9921f ("net-core: rx_otherhost_dropped to core_stats") https://lore.kernel.org/all/20220428111903.5f4304e0@canb.auug.org.au/ drivers/net/wan/cosa.c d48fea8401cf ("net: cosa: fix error check return value of register_chrdev()") 89fbca3307d4 ("net: wan: remove support for COSA and SRP synchronous serial boards") https://lore.kernel.org/all/20220428112130.1f689e5e@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-23net: bridge: switchdev: check br_vlan_group() return valueClément Léger1-0/+2
br_vlan_group() can return NULL and thus return value must be checked to avoid dereferencing a NULL pointer. Fixes: 6284c723d9b9 ("net: bridge: mst: Notify switchdev drivers of VLAN MSTI migrations") Signed-off-by: Clément Léger <clement.leger@bootlin.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20220421101247.121896-1-clement.leger@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-13net: bridge: fdb: add support for flush filtering based on ifindex and vlanNikolay Aleksandrov1-1/+44
Add support for fdb flush filtering based on destination ifindex and vlan id. The ifindex must either match a port's device ifindex or the bridge's. The vlan support is trivial since it's already validated by rtnl_fdb_del, we just need to fill it in. Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13net: bridge: fdb: add support for flush filtering based on ndm flags and stateNikolay Aleksandrov2-3/+60
Add support for fdb flush filtering based on ndm flags and state. NDM state and flags are mapped to bridge-specific flags and matched according to the specified masks. NTF_USE is used to represent added_by_user flag since it sets it on fdb add and we don't have a 1:1 mapping for it. Only allowed bits can be set, NTF_SELF and NTF_MASTER are ignored. Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13net: bridge: fdb: add support for fine-grained flushingNikolay Aleksandrov4-12/+54
Add the ability to specify exactly which fdbs to be flushed. They are described by a new structure - net_bridge_fdb_flush_desc. Currently it can match on port/bridge ifindex, vlan id and fdb flags. It is used to describe the existing dynamic fdb flush operation. Note that this flush operation doesn't treat permanent entries in a special way (fdb_delete vs fdb_delete_local), it will delete them regardless if any port is using them, so currently it can't directly replace deletes which need to handle that case, although we can extend it later for that too. Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13net: bridge: fdb: add ndo_fdb_del_bulkNikolay Aleksandrov3-0/+27
Add a minimal ndo_fdb_del_bulk implementation which flushes all entries. Support for more fine-grained filtering will be added in the following patches. Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-12net: bridge: add support for host l2 mdb entriesJoachim Wiberg1-5/+7
This patch expands on the earlier work on layer-2 mdb entries by adding support for host entries. Due to the fact that host joined entries do not have any flag field, we infer the permanent flag when reporting the entries to userspace, which otherwise would be listed as 'temp'. Before patch: ~# bridge mdb add dev br0 port br0 grp 01:00:00:c0:ff:ee permanent Error: bridge: Flags are not allowed for host groups. ~# bridge mdb add dev br0 port br0 grp 01:00:00:c0:ff:ee Error: bridge: Only permanent L2 entries allowed. After patch: ~# bridge mdb add dev br0 port br0 grp 01:00:00:c0:ff:ee permanent ~# bridge mdb show dev br0 port br0 grp 01:00:00:c0:ff:ee permanent vid 1 Signed-off-by: Joachim Wiberg <troglobit@gmail.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-12net: bridge: offload BR_HAIRPIN_MODE, BR_ISOLATED, BR_MULTICAST_TO_UNICASTArınç ÜNAL1-1/+2
Add BR_HAIRPIN_MODE, BR_ISOLATED and BR_MULTICAST_TO_UNICAST port flags to BR_PORT_FLAGS_HW_OFFLOAD so that switchdev drivers which have an offloaded data plane have a chance to reject these bridge port flags if they don't support them yet. It makes the code path go through the SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS driver handlers, which return -EINVAL for everything they don't recognize. For drivers that don't catch SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS at all, switchdev will return -EOPNOTSUPP for those which is then ignored, but those are in the minority. Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20220410134227.18810-1-arinc.unal@arinc9.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-23net: bridge: mst: Restrict info size queries to bridge portsTobias Waldekranz1-1/+1
Ensure that no bridge masters are ever considered for MST info dumping. MST states are only supported on bridge ports, not bridge masters - which br_mst_info_size relies on. Fixes: 122c29486e1f ("net: bridge: mst: Support setting and reporting MST port states") Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Link: https://lore.kernel.org/r/20220322133001.16181-1-tobias@waldekranz.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-22net: bridge: mst: prevent NULL deref in br_mst_info_size()Eric Dumazet1-1/+1
Call br_mst_info_size() only if vg pointer is not NULL. general protection fault, probably for non-canonical address 0xdffffc0000000058: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x00000000000002c0-0x00000000000002c7] CPU: 0 PID: 975 Comm: syz-executor.0 Tainted: G W 5.17.0-next-20220321-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:br_mst_info_size+0x97/0x270 net/bridge/br_mst.c:242 Code: 00 00 31 c0 e8 ba 10 53 f9 31 c0 b9 40 00 00 00 4c 8d 6c 24 30 4c 89 ef f3 48 ab 48 8d 83 c0 02 00 00 48 89 04 24 48 c1 e8 03 <80> 3c 28 00 0f 85 ae 01 00 00 48 8b 83 c0 02 00 00 41 bf 04 00 00 RSP: 0018:ffffc900153770a8 EFLAGS: 00010202 RAX: 0000000000000058 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000040000 RSI: ffffffff88259876 RDI: ffffc900153772d8 RBP: dffffc0000000000 R08: 0000000000000000 R09: ffffffff8db68957 R10: ffffffff881f737b R11: 0000000000000000 R12: 0000000000000000 R13: ffffc900153770d8 R14: 00000000000002a0 R15: 00000000ffffffff FS: 00007f18bbb6f700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020001a80 CR3: 000000001a7d9000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 00000000000000d8 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> br_get_link_af_size_filtered+0x6e9/0xc00 net/bridge/br_netlink.c:123 rtnl_link_get_af_size net/core/rtnetlink.c:598 [inline] if_nlmsg_size+0x40c/0xa50 net/core/rtnetlink.c:1040 rtnl_calcit.isra.0+0x25f/0x460 net/core/rtnetlink.c:3780 rtnetlink_rcv_msg+0xa65/0xb80 net/core/rtnetlink.c:5937 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2496 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345 netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1921 sock_sendmsg_nosec net/socket.c:705 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:725 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2413 ___sys_sendmsg+0xf3/0x170 net/socket.c:2467 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2496 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0x80 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f18baa89049 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f18bbb6f168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007f18bab9bf60 RCX: 00007f18baa89049 RDX: 0000000000000000 RSI: 0000000020001a80 RDI: 0000000000000004 RBP: 00007f18baae308d R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007ffeedb2be2f R14: 00007f18bbb6f300 R15: 0000000000022000 </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- RIP: 0010:br_mst_info_size+0x97/0x270 net/bridge/br_mst.c:242 Code: 00 00 31 c0 e8 ba 10 53 f9 31 c0 b9 40 00 00 00 4c 8d 6c 24 30 4c 89 ef f3 48 ab 48 8d 83 c0 02 00 00 48 89 04 24 48 c1 e8 03 <80> 3c 28 00 0f 85 ae 01 00 00 48 8b 83 c0 02 00 00 41 bf 04 00 00 RSP: 0018:ffffc900153770a8 EFLAGS: 00010202 RAX: 0000000000000058 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000040000 RSI: ffffffff88259876 RDI: ffffc900153772d8 RBP: dffffc0000000000 R08: 0000000000000000 R09: ffffffff8db68957 R10: ffffffff881f737b R11: 0000000000000000 R12: 0000000000000000 R13: ffffc900153770d8 R14: 00000000000002a0 R15: 00000000ffffffff FS: 00007f18bbb6f700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b2ca22000 CR3: 000000001a7d9000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 00000000000000d8 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Fixes: 122c29486e1f ("net: bridge: mst: Support setting and reporting MST port states") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tobias Waldekranz <tobias@waldekranz.com> Cc: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20220322012314.795187-1-eric.dumazet@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-03-20netfilter: nft_meta: extend reduce support to bridge familyFlorian Westphal1-0/+1
its enough to export the meta get reduce helper and then call it from nft_meta_bridge too. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-03-20netfilter: nf_tables: cancel tracking for clobbered destination registersPablo Neira Ayuso1-2/+2
Output of expressions might be larger than one single register, this might clobber existing data. Reset tracking for all destination registers that required to store the expression output. This patch adds three new helper functions: - nft_reg_track_update: cancel previous register tracking and update it. - nft_reg_track_cancel: cancel any previous register tracking info. - __nft_reg_track_cancel: cancel only one single register tracking info. Partial register clobbering detection is also supported by checking the .num_reg field which describes the number of register that are used. This patch updates the following expressions: - meta_bridge - bitwise - byteorder - meta - payload to use these helper functions. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-03-20netfilter: nf_tables: do not reduce read-only expressionsPablo Neira Ayuso1-0/+1
Skip register tracking for expressions that perform read-only operations on the registers. Define and use a cookie pointer NFT_REDUCE_READONLY to avoid defining stubs for these expressions. This patch re-enables register tracking which was disabled in ed5f85d42290 ("netfilter: nf_tables: disable register tracking"). Follow up patches add remaining register tracking for existing expressions. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-03-18net: bridge: mst: Add helper to query a port's MST stateTobias Waldekranz1-0/+25
This is useful for switchdev drivers who are offloading MST states into hardware. As an example, a driver may wish to flush the FDB for a port when it transitions from forwarding to blocking - which means that the previous state must be discoverable. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-18net: bridge: mst: Add helper to check if MST is enabledTobias Waldekranz1-0/+9
This is useful for switchdev drivers that might want to refuse to join a bridge where MST is enabled, if the hardware can't support it. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-18net: bridge: mst: Add helper to map an MSTI to a VID setTobias Waldekranz1-0/+26
br_mst_get_info answers the question: "On this bridge, which VIDs are mapped to the given MSTI?" This is useful in switchdev drivers, which might have to fan-out operations, relating to an MSTI, per VLAN. An example: When a port's MST state changes from forwarding to blocking, a driver may choose to flush the dynamic FDB entries on that port to get faster reconvergence of the network, but this should only be done in the VLANs that are managed by the MSTI in question. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-18net: bridge: mst: Notify switchdev drivers of MST state changesTobias Waldekranz1-0/+18
Generate a switchdev notification whenever an MST state changes. This notification is keyed by the VLANs MSTI rather than the VID, since multiple VLANs may share the same MST instance. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-18net: bridge: mst: Notify switchdev drivers of VLAN MSTI migrationsTobias Waldekranz2-0/+59
Whenever a VLAN moves to a new MSTI, send a switchdev notification so that switchdevs can track a bridge's VID to MSTI mappings. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-18net: bridge: mst: Notify switchdev drivers of MST mode changesTobias Waldekranz1-0/+11
Trigger a switchdev event whenever the bridge's MST mode is enabled/disabled. This allows constituent ports to either perform any required hardware config, or refuse the change if it not supported. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-18net: bridge: mst: Support setting and reporting MST port statesTobias Waldekranz3-1/+192
Make it possible to change the port state in a given MSTI by extending the bridge port netlink interface (RTM_SETLINK on PF_BRIDGE).The proposed iproute2 interface would be: bridge mst set dev <PORT> msti <MSTI> state <STATE> Current states in all applicable MSTIs can also be dumped via a corresponding RTM_GETLINK. The proposed iproute interface looks like this: $ bridge mst port msti vb1 0 state forwarding 100 state disabled vb2 0 state forwarding 100 state forwarding The preexisting per-VLAN states are still valid in the MST mode (although they are read-only), and can be queried as usual if one is interested in knowing a particular VLAN's state without having to care about the VID to MSTI mapping (in this example VLAN 20 and 30 are bound to MSTI 100): $ bridge -d vlan port vlan-id vb1 10 state forwarding mcast_router 1 20 state disabled mcast_router 1 30 state disabled mcast_router 1 40 state forwarding mcast_router 1 vb2 10 state forwarding mcast_router 1 20 state forwarding mcast_router 1 30 state forwarding mcast_router 1 40 state forwarding mcast_router 1 Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-18net: bridge: mst: Allow changing a VLAN's MSTITobias Waldekranz3-0/+58
Allow a VLAN to move out of the CST (MSTI 0), to an independent tree. The user manages the VID to MSTI mappings via a global VLAN setting. The proposed iproute2 interface would be: bridge vlan global set dev br0 vid <VID> msti <MSTI> Changing the state in non-zero MSTIs is still not supported, but will be addressed in upcoming changes. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-18net: bridge: mst: Multiple Spanning Tree (MST) modeTobias Waldekranz8-4/+175
Allow the user to switch from the current per-VLAN STP mode to an MST mode. Up to this point, per-VLAN STP states where always isolated from each other. This is in contrast to the MSTP standard (802.1Q-2018, Clause 13.5), where VLANs are grouped into MST instances (MSTIs), and the state is managed on a per-MSTI level, rather that at the per-VLAN level. Perhaps due to the prevalence of the standard, many switching ASICs are built after the same model. Therefore, add a corresponding MST mode to the bridge, which we can later add offloading support for in a straight-forward way. For now, all VLANs are fixed to MSTI 0, also called the Common Spanning Tree (CST). That is, all VLANs will follow the port-global state. Upcoming changes will make this actually useful by allowing VLANs to be mapped to arbitrary MSTIs and allow individual MSTI states to be changed. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-nextJakub Kicinski1-1/+1
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next 1) Revert CHECKSUM_UNNECESSARY for UDP packet from conntrack. 2) Reject unsupported families when creating tables, from Phil Sutter. 3) GRE support for the flowtable, from Toshiaki Makita. 4) Add GRE offload support for act_ct, also from Toshiaki. 5) Update mlx5 driver to support for GRE flowtable offload, from Toshiaki Makita. 6) Oneliner to clean up incorrect indentation in nf_conntrack_bridge, from Jiapeng Chong. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: bridge: clean up some inconsistent indenting net/mlx5: Support GRE conntrack offload act_ct: Support GRE offload netfilter: flowtable: Support GRE netfilter: nf_tables: Reject tables of unsupported family Revert "netfilter: conntrack: mark UDP zero checksum as CHECKSUM_UNNECESSARY" ==================== Link: https://lore.kernel.org/r/20220315091513.66544-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-07netfilter: bridge: clean up some inconsistent indentingJiapeng Chong1-1/+1
Eliminate the follow smatch warning: net/bridge/netfilter/nf_conntrack_bridge.c:385 nf_ct_bridge_confirm() warn: inconsistent indenting. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Florian Westphal <fw@strlen.de>
2022-03-04net: bridge: Use netif_rx().Sebastian Andrzej Siewior1-2/+2
Since commit baebdf48c3600 ("net: dev: Makes sure netif_rx() can be invoked in any context.") the function netif_rx() can be used in preemptible/thread context as well as in interrupt context. Use netif_rx(). Cc: Roopa Prabhu <roopa@nvidia.com> Cc: Nikolay Aleksandrov <razor@blackwall.org> Cc: bridge@lists.linux-foundation.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03net: Add skb_clear_tstamp() to keep the mono delivery_timeMartin KaFai Lau1-1/+1
Right now, skb->tstamp is reset to 0 whenever the skb is forwarded. If skb->tstamp has the mono delivery_time, clearing it can hurt the performance when it finally transmits out to fq@phy-dev. The earlier patch added a skb->mono_delivery_time bit to flag the skb->tstamp carrying the mono delivery_time. This patch adds skb_clear_tstamp() helper which keeps the mono delivery_time and clears everything else. The delivery_time clearing will be postponed until the stack knows the skb will be delivered locally. It will be done in a latter patch. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03net: Add skb->mono_delivery_time to distinguish mono delivery_time from ↵Martin KaFai Lau1-2/+3
(rcv) timestamp skb->tstamp was first used as the (rcv) timestamp. The major usage is to report it to the user (e.g. SO_TIMESTAMP). Later, skb->tstamp is also set as the (future) delivery_time (e.g. EDT in TCP) during egress and used by the qdisc (e.g. sch_fq) to make decision on when the skb can be passed to the dev. Currently, there is no way to tell skb->tstamp having the (rcv) timestamp or the delivery_time, so it is always reset to 0 whenever forwarded between egress and ingress. While it makes sense to always clear the (rcv) timestamp in skb->tstamp to avoid confusing sch_fq that expects the delivery_time, it is a performance issue [0] to clear the delivery_time if the skb finally egress to a fq@phy-dev. For example, when forwarding from egress to ingress and then finally back to egress: tcp-sender => veth@netns => veth@hostns => fq@eth0@hostns ^ ^ reset rest This patch adds one bit skb->mono_delivery_time to flag the skb->tstamp is storing the mono delivery_time (EDT) instead of the (rcv) timestamp. The current use case is to keep the TCP mono delivery_time (EDT) and to be used with sch_fq. A latter patch will also allow tc-bpf@ingress to read and change the mono delivery_time. In the future, another bit (e.g. skb->user_delivery_time) can be added for the SCM_TXTIME where the clock base is tracked by sk->sk_clockid. [ This patch is a prep work. The following patches will get the other parts of the stack ready first. Then another patch after that will finally set the skb->mono_delivery_time. ] skb_set_delivery_time() function is added. It is used by the tcp_output.c and during ip[6] fragmentation to assign the delivery_time to the skb->tstamp and also set the skb->mono_delivery_time. A note on the change in ip_send_unicast_reply() in ip_output.c. It is only used by TCP to send reset/ack out of a ctl_sk. Like the new skb_set_delivery_time(), this patch sets the skb->mono_delivery_time to 0 for now as a place holder. It will be enabled in a latter patch. A similar case in tcp_ipv6 can be done with skb_set_delivery_time() in tcp_v6_send_response(). [0] (slide 22): https://linuxplumbersconf.org/event/11/contributions/953/attachments/867/1658/LPC_2021_BPF_Datapath_Extensions.pdf Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-23net: bridge: Add support for offloading of locked port flagHans Schultz1-1/+1
Various switchcores support setting ports in locked mode, so that clients behind locked ports cannot send traffic through the port unless a fdb entry is added with the clients MAC address. Signed-off-by: Hans Schultz <schultz.hans+netdev@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-23net: bridge: Add support for bridge port in locked modeHans Schultz2-2/+15
In a 802.1X scenario, clients connected to a bridge port shall not be allowed to have traffic forwarded until fully authenticated. A static fdb entry of the clients MAC address for the bridge port unlocks the client and allows bidirectional communication. This scenario is facilitated with setting the bridge port in locked mode, which is also supported by various switchcore chipsets. Signed-off-by: Hans Schultz <schultz.hans+netdev@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-19bridge: switch br_net_exit to batch modeEric Dumazet1-6/+9
cleanup_net() is competing with other rtnl users. Instead of calling br_net_exit() for each netns, call br_net_exit_batch() once. This gives cleanup_net() ability to group more devices and call unregister_netdevice_many() only once for all bridge devices. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Roopa Prabhu <roopa@nvidia.com> Cc: Nikolay Aleksandrov <razor@blackwall.org> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-0/+4
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-17net: bridge: multicast: notify switchdev driver whenever MC processing gets ↵Oleksandr Mazur1-0/+4
disabled Whenever bridge driver hits the max capacity of MDBs, it disables the MC processing (by setting corresponding bridge option), but never notifies switchdev about such change (the notifiers are called only upon explicit setting of this option, through the registered netlink interface). This could lead to situation when Software MDB processing gets disabled, but this event never gets offloaded to the underlying Hardware. Fix this by adding a notify message in such case. Fixes: 147c1e9b902c ("switchdev: bridge: Offload multicast disabled") Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/20220215165303.31908-1-oleksandr.mazur@plvision.eu Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-16net: bridge: switchdev: replay all VLAN groupsVladimir Oltean1-41/+49
The major user of replayed switchdev objects is DSA, and so far it hasn't needed information about anything other than bridge port VLANs, so this is all that br_switchdev_vlan_replay() knows to handle. DSA has managed to get by through replicating every VLAN addition on a user port such that the same VLAN is also added on all DSA and CPU ports, but there is a corner case where this does not work. The mv88e6xxx DSA driver currently prints this error message as soon as the first port of a switch joins a bridge: mv88e6085 0x0000000008b96000:00: port 0 failed to add a6:ef:77:c8:5f:3d vid 1 to fdb: -95 where a6:ef:77:c8:5f:3d vid 1 is a local FDB entry corresponding to the bridge MAC address in the default_pvid. The -EOPNOTSUPP is returned by mv88e6xxx_port_db_load_purge() because it tries to map VID 1 to a FID (the ATU is indexed by FID not VID), but fails to do so. This is because ->port_fdb_add() is called before ->port_vlan_add() for VID 1. The abridged timeline of the calls is: br_add_if -> netdev_master_upper_dev_link -> dsa_port_bridge_join -> switchdev_bridge_port_offload -> br_switchdev_vlan_replay (*) -> br_switchdev_fdb_replay -> mv88e6xxx_port_fdb_add -> nbp_vlan_init -> nbp_vlan_add -> mv88e6xxx_port_vlan_add and the issue is that at the time of (*), the bridge port isn't in VID 1 (nbp_vlan_init hasn't been called), therefore br_switchdev_vlan_replay() won't have anything to replay, therefore VID 1 won't be in the VTU by the time mv88e6xxx_port_fdb_add() is called. This happens only when the first port of a switch joins. For further ports, the initial mv88e6xxx_port_vlan_add() is sufficient for VID 1 to be loaded in the VTU (which is switch-wide, not per port). The problem is somewhat unique to mv88e6xxx by chance, because most other drivers offload an FDB entry by VID, so FDBs and VLANs can be added asynchronously with respect to each other, but addressing the issue at the bridge layer makes sense, since what mv88e6xxx requires isn't absurd. To fix this problem, we need to recognize that it isn't the VLAN group of the port that we're interested in, but the VLAN group of the bridge itself (so it isn't a timing issue, but rather insufficient information being passed from switchdev to drivers). As mentioned, currently nbp_switchdev_sync_objs() only calls br_switchdev_vlan_replay() for VLANs corresponding to the port, but the VLANs corresponding to the bridge itself, for local termination, also need to be replayed. In this case, VID 1 is not (yet) present in the port's VLAN group but is present in the bridge's VLAN group. So to fix this bug, DSA is now obligated to explicitly handle VLANs pointing towards the bridge in order to "close this race" (which isn't really a race). As Tobias Waldekranz notices, this also implies that it must explicitly handle port VLANs on foreign interfaces, something that worked implicitly before: https://patchwork.kernel.org/project/netdevbpf/patch/20220209213044.2353153-6-vladimir.oltean@nxp.com/#24735260 So in the end, br_switchdev_vlan_replay() must replay all VLANs from all VLAN groups: all the ports, and the bridge itself. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-16net: bridge: make nbp_switchdev_unsync_objs() follow reverse order of sync()Vladimir Oltean1-2/+2
There may be switchdev drivers that can add/remove a FDB or MDB entry only as long as the VLAN it's in has been notified and offloaded first. The nbp_switchdev_sync_objs() method satisfies this requirement on addition, but nbp_switchdev_unsync_objs() first deletes VLANs, then deletes MDBs and FDBs. Reverse the order of the function calls to cater to this requirement. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-16net: bridge: switchdev: differentiate new VLANs from changed onesVladimir Oltean3-9/+10
br_switchdev_port_vlan_add() currently emits a SWITCHDEV_PORT_OBJ_ADD event with a SWITCHDEV_OBJ_ID_PORT_VLAN for 2 distinct cases: - a struct net_bridge_vlan got created - an existing struct net_bridge_vlan was modified This makes it impossible for switchdev drivers to properly balance PORT_OBJ_ADD with PORT_OBJ_DEL events, so if we want to allow that to happen, we must provide a way for drivers to distinguish between a VLAN with changed flags and a new one. Annotate struct switchdev_obj_port_vlan with a "bool changed" that distinguishes the 2 cases above. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-16net: bridge: vlan: notify switchdev only when something changedVladimir Oltean1-30/+65
Currently, when a VLAN entry is added multiple times in a row to a bridge port, nbp_vlan_add() calls br_switchdev_port_vlan_add() each time, even if the VLAN already exists and nothing about it has changed: bridge vlan add dev lan12 vid 100 master static Similarly, when a VLAN is added multiple times in a row to a bridge, br_vlan_add_existing() doesn't filter at all the calls to br_switchdev_port_vlan_add(): bridge vlan add dev br0 vid 100 self This behavior makes driver-level accounting of VLANs impossible, since it is enough for a single deletion event to remove a VLAN, but the addition event can be emitted an unlimited number of times. The cause for this can be identified as follows: we rely on __vlan_add_flags() to retroactively tell us whether it has changed anything about the VLAN flags or VLAN group pvid. So we'd first have to call __vlan_add_flags() before calling br_switchdev_port_vlan_add(), in order to have access to the "bool *changed" information. But we don't want to change the event ordering, because we'd have to revert the struct net_bridge_vlan changes we've made if switchdev returns an error. So to solve this, we need another function that tells us whether any change is going to occur in the VLAN or VLAN group, _prior_ to calling __vlan_add_flags(). Split __vlan_add_flags() into a precommit and a commit stage, and rename it to __vlan_flags_update(). The precommit stage, __vlan_flags_would_change(), will determine whether there is any reason to notify switchdev due to a change of flags (note: the BRENTRY flag transition from false to true is treated separately: as a new switchdev entry, because we skipped notifying the master VLAN when it wasn't a brentry yet, and therefore not as a change of flags). With this lookahead/precommit function in place, we can avoid notifying switchdev if nothing changed for the VLAN and VLAN group. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-16net: bridge: vlan: make __vlan_add_flags react only to PVID and UNTAGGEDVladimir Oltean1-2/+4
Currently there is a very subtle aspect to the behavior of __vlan_add_flags(): it changes the struct net_bridge_vlan flags and pvid, yet it returns true ("changed") even if none of those changed, just a transition of br_vlan_is_brentry(v) took place from false to true. This can be seen in br_vlan_add_existing(), however we do not actually rely on this subtle behavior, since the "if" condition that checks that the vlan wasn't a brentry before had a useless (until now) assignment: *changed = true; Make things more obvious by actually making __vlan_add_flags() do what's written on the box, and be more specific about what is actually written on the box. This is needed because further transformations will be done to __vlan_add_flags(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-16net: bridge: vlan: don't notify to switchdev master VLANs without BRENTRY flagVladimir Oltean1-3/+6
When a VLAN is added to a bridge port and it doesn't exist on the bridge device yet, it gets created for the multicast context, but it is 'hidden', since it doesn't have the BRENTRY flag yet: ip link add br0 type bridge && ip link set swp0 master br0 bridge vlan add dev swp0 vid 100 # the master VLAN 100 gets created bridge vlan add dev br0 vid 100 self # that VLAN becomes brentry just now All switchdev drivers ignore switchdev notifiers for VLAN entries which have the BRENTRY unset, and for good reason: these are merely private data structures used by the bridge driver. So we might just as well not notify those at all. Cleanup in the switchdev drivers that check for the BRENTRY flag is now possible, and will be handled separately, since those checks just became dead code. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-16net: bridge: vlan: check early for lack of BRENTRY flag in br_vlan_add_existingVladimir Oltean1-6/+4
When a VLAN is added to a bridge port, a master VLAN gets created on the bridge for context, but it doesn't have the BRENTRY flag. Then, when the same VLAN is added to the bridge itself, that enters through the br_vlan_add_existing() code path and gains the BRENTRY flag, thus it becomes "existing". It seems natural to check for this condition early, because the current code flow is to notify switchdev of the addition of a VLAN that isn't a brentry, just to delete it immediately afterwards. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-15net: bridge: vlan: check for errors from __vlan_del in __vlan_flushVladimir Oltean1-1/+8
If the following call path returns an error from switchdev: nbp_vlan_flush -> __vlan_del -> __vlan_vid_del -> br_switchdev_port_vlan_del -> __vlan_group_free -> WARN_ON(!list_empty(&vg->vlan_list)); then the deletion of the net_bridge_vlan is silently halted, which will trigger the WARN_ON from __vlan_group_free(). The WARN_ON is rather unhelpful, because nothing about the source of the error is printed. Add a print to catch errors from __vlan_del. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfJakub Kicinski1-4/+4
Pablo Neira Ayuso says: ==================== Netfilter fixes for net 1) Remove leftovers from flowtable modules, from Geert Uytterhoeven. 2) Missing refcount increment of conntrack template in nft_ct, from Florian Westphal. 3) Reduce nft_zone selftest time, also from Florian. 4) Add selftest to cover stateless NAT on fragments, from Florian Westphal. 5) Do not set net_device when for reject packets from the bridge path, from Phil Sutter. 6) Cancel register tracking info on nft_byteorder operations. 7) Extend nft_concat_range selftest to cover set reload with no elements, from Florian Westphal. 8) Remove useless update of pointer in chain blob builder, reported by kbuild test robot. * git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf: netfilter: nf_tables: remove assignment with no effect in chain blob builder selftests: nft_concat_range: add test for reload with no element add/del netfilter: nft_byteorder: track register operations netfilter: nft_reject_bridge: Fix for missing reply from prerouting selftests: netfilter: check stateless nat udp checksum fixup selftests: netfilter: reduce zone stress test running time netfilter: nft_ct: fix use after free when attaching zone template netfilter: Remove flowtable relics ==================== Link: https://lore.kernel.org/r/20220127235235.656931-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-27net: bridge: vlan: fix memory leak in __allowed_ingressTim Yi1-3/+3
When using per-vlan state, if vlan snooping and stats are disabled, untagged or priority-tagged ingress frame will go to check pvid state. If the port state is forwarding and the pvid state is not learning/forwarding, untagged or priority-tagged frame will be dropped but skb memory is not freed. Should free skb when __allowed_ingress returns false. Fixes: a580c76d534c ("net: bridge: vlan: add per-vlan state") Signed-off-by: Tim Yi <tim.yi@pica8.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/20220127074953.12632-1-tim.yi@pica8.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-27net: bridge: vlan: fix single net device option dumpingNikolay Aleksandrov1-1/+2
When dumping vlan options for a single net device we send the same entries infinitely because user-space expects a 0 return at the end but we keep returning skb->len and restarting the dump on retry. Fix it by returning the value from br_vlan_dump_dev() if it completed or there was an error. The only case that must return skb->len is when the dump was incomplete and needs to continue (-EMSGSIZE). Reported-by: Benjamin Poirier <bpoirier@nvidia.com> Fixes: 8dcea187088b ("net: bridge: vlan: add rtm definitions and dump support") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-27netfilter: nft_reject_bridge: Fix for missing reply from preroutingPhil Sutter1-4/+4
Prior to commit fa538f7cf05aa ("netfilter: nf_reject: add reject skbuff creation helpers"), nft_reject_bridge did not assign to nskb->dev before passing nskb on to br_forward(). The shared skbuff creation helpers introduced in above commit do which seems to confuse br_forward() as reject statements in prerouting hook won't emit a packet anymore. Fix this by simply passing NULL instead of 'dev' to the helpers - they use the pointer for just that assignment, nothing else. Fixes: fa538f7cf05aa ("netfilter: nf_reject: add reject skbuff creation helpers") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-12net: bridge: fix net device refcount tracking issue in error pathEric Dumazet1-1/+2
I left one dev_put() in br_add_if() error path and sure enough syzbot found its way. As the tracker is allocated in new_nbp(), we must make sure to properly free it. We have to call dev_put_track(dev, &p->dev_tracker) before @p object is freed, of course. This is not an issue because br_add_if() owns a reference on @dev. Fixes: b2dcdc7f731d ("net: bridge: add net device refcount tracker") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextJakub Kicinski1-0/+20
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next. This includes one patch to update ovs and act_ct to use nf_ct_put() instead of nf_conntrack_put(). 1) Add netns_tracker to nfnetlink_log and masquerade, from Eric Dumazet. 2) Remove redundant rcu read-size lock in nf_tables packet path. 3) Replace BUG() by WARN_ON_ONCE() in nft_payload. 4) Consolidate rule verdict tracing. 5) Replace WARN_ON() by WARN_ON_ONCE() in nf_tables core. 6) Make counter support built-in in nf_tables. 7) Add new field to conntrack object to identify locally generated traffic, from Florian Westphal. 8) Prevent NAT from shadowing well-known ports, from Florian Westphal. 9) Merge nf_flow_table_{ipv4,ipv6} into nf_flow_table_inet, also from Florian. 10) Remove redundant pointer in nft_pipapo AVX2 support, from Colin Ian King. 11) Replace opencoded max() in conntrack, from Jiapeng Chong. 12) Update conntrack to use refcount_t API, from Florian Westphal. 13) Move ip_ct_attach indirection into the nf_ct_hook structure. 14) Constify several pointer object in the netfilter codebase, from Florian Westphal. 15) Tree-wide replacement of nf_conntrack_put() by nf_ct_put(), also from Florian. 16) Fix egress splat due to incorrect rcu notation, from Florian. 17) Move stateful fields of connlimit, last, quota, numgen and limit out of the expression data area. 18) Build a blob to represent the ruleset in nf_tables, this is a requirement of the new register tracking infrastructure. 19) Add NFT_REG32_NUM to define the maximum number of 32-bit registers. 20) Add register tracking infrastructure to skip redundant store-to-register operations, this includes support for payload, meta and bitwise expresssions. * git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next: (32 commits) netfilter: nft_meta: cancel register tracking after meta update netfilter: nft_payload: cancel register tracking after payload update netfilter: nft_bitwise: track register operations netfilter: nft_meta: track register operations netfilter: nft_payload: track register operations netfilter: nf_tables: add register tracking infrastructure netfilter: nf_tables: add NFT_REG32_NUM netfilter: nf_tables: add rule blob layout netfilter: nft_limit: move stateful fields out of expression data netfilter: nft_limit: rename stateful structure netfilter: nft_numgen: move stateful fields out of expression data netfilter: nft_quota: move stateful fields out of expression data netfilter: nft_last: move stateful fields out of expression data netfilter: nft_connlimit: move stateful fields out of expression data netfilter: egress: avoid a lockdep splat net: prefer nf_ct_put instead of nf_conntrack_put netfilter: conntrack: avoid useless indirection during conntrack destruction netfilter: make function op structures const netfilter: core: move ip_ct_attach indirection to struct nf_ct_hook netfilter: conntrack: convert to refcount_t api ... ==================== Link: https://lore.kernel.org/r/20220109231640.104123-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-10netfilter: nft_meta: cancel register tracking after meta updatePablo Neira Ayuso1-0/+20
The meta expression might mangle the packet metadata, cancel register tracking since any metadata in the registers is stale. Finer grain register tracking cancellation by inspecting the meta type on the register is also possible. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller1-0/+1
Alexei Starovoitov says: ==================== pull-request: bpf-next 2021-12-30 The following pull-request contains BPF updates for your *net-next* tree. We've added 72 non-merge commits during the last 20 day(s) which contain a total of 223 files changed, 3510 insertions(+), 1591 deletions(-). The main changes are: 1) Automatic setrlimit in libbpf when bpf is memcg's in the kernel, from Andrii. 2) Beautify and de-verbose verifier logs, from Christy. 3) Composable verifier types, from Hao. 4) bpf_strncmp helper, from Hou. 5) bpf.h header dependency cleanup, from Jakub. 6) get_func_[arg|ret|arg_cnt] helpers, from Jiri. 7) Sleepable local storage, from KP. 8) Extend kfunc with PTR_TO_CTX, PTR_TO_MEM argument support, from Kumar. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>