summaryrefslogtreecommitdiff
path: root/net/bridge
AgeCommit message (Collapse)AuthorFilesLines
2019-04-23bridge: Fix possible use-after-free when deleting bridge portIdo Schimmel1-1/+2
When a bridge port is being deleted, do not dereference it later in br_vlan_port_event() as it can result in a use-after-free [1] if the RCU callback was executed before invoking the function. [1] [ 129.638551] ================================================================== [ 129.646904] BUG: KASAN: use-after-free in br_vlan_port_event+0x53c/0x5fd [ 129.654406] Read of size 8 at addr ffff8881e4aa1ae8 by task ip/483 [ 129.663008] CPU: 0 PID: 483 Comm: ip Not tainted 5.1.0-rc5-custom-02265-ga946bd73daac #1383 [ 129.672359] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016 [ 129.682484] Call Trace: [ 129.685242] dump_stack+0xa9/0x10e [ 129.689068] print_address_description.cold.2+0x9/0x25e [ 129.694930] kasan_report.cold.3+0x78/0x9d [ 129.704420] br_vlan_port_event+0x53c/0x5fd [ 129.728300] br_device_event+0x2c7/0x7a0 [ 129.741505] notifier_call_chain+0xb5/0x1c0 [ 129.746202] rollback_registered_many+0x895/0xe90 [ 129.793119] unregister_netdevice_many+0x48/0x210 [ 129.803384] rtnl_delete_link+0xe1/0x140 [ 129.815906] rtnl_dellink+0x2a3/0x820 [ 129.844166] rtnetlink_rcv_msg+0x397/0x910 [ 129.868517] netlink_rcv_skb+0x137/0x3a0 [ 129.882013] netlink_unicast+0x49b/0x660 [ 129.900019] netlink_sendmsg+0x755/0xc90 [ 129.915758] ___sys_sendmsg+0x761/0x8e0 [ 129.966315] __sys_sendmsg+0xf0/0x1c0 [ 129.988918] do_syscall_64+0xa4/0x470 [ 129.993032] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 129.998696] RIP: 0033:0x7ff578104b58 ... [ 130.073811] Allocated by task 479: [ 130.077633] __kasan_kmalloc.constprop.5+0xc1/0xd0 [ 130.083008] kmem_cache_alloc_trace+0x152/0x320 [ 130.088090] br_add_if+0x39c/0x1580 [ 130.092005] do_set_master+0x1aa/0x210 [ 130.096211] do_setlink+0x985/0x3100 [ 130.100224] __rtnl_newlink+0xc52/0x1380 [ 130.104625] rtnl_newlink+0x6b/0xa0 [ 130.108541] rtnetlink_rcv_msg+0x397/0x910 [ 130.113136] netlink_rcv_skb+0x137/0x3a0 [ 130.117538] netlink_unicast+0x49b/0x660 [ 130.121939] netlink_sendmsg+0x755/0xc90 [ 130.126340] ___sys_sendmsg+0x761/0x8e0 [ 130.130645] __sys_sendmsg+0xf0/0x1c0 [ 130.134753] do_syscall_64+0xa4/0x470 [ 130.138864] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 130.146195] Freed by task 0: [ 130.149421] __kasan_slab_free+0x125/0x170 [ 130.154016] kfree+0xf3/0x310 [ 130.157349] kobject_put+0x1a8/0x4c0 [ 130.161363] rcu_core+0x859/0x19b0 [ 130.165175] __do_softirq+0x250/0xa26 [ 130.170956] The buggy address belongs to the object at ffff8881e4aa1ae8 which belongs to the cache kmalloc-1k of size 1024 [ 130.184972] The buggy address is located 0 bytes inside of 1024-byte region [ffff8881e4aa1ae8, ffff8881e4aa1ee8) Fixes: 9c0ec2e7182a ("bridge: support binding vlan dev link state to vlan member bridge ports") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Cc: Mike Manning <mmanning@vyatta.att-mail.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Mike Manning <mmanning@vyatta.att-mail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller1-1/+2
Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains Netfilter/IPVS fixes for your net tree: 1) Add a selftest for icmp packet too big errors with conntrack, from Florian Westphal. 2) Validate inner header in ICMP error message does not lie to us in conntrack, also from Florian. 3) Initialize ct->timeout to calm down KASAN, from Alexander Potapenko. 4) Skip ICMP error messages from tunnels in IPVS, from Julian Anastasov. 5) Use a hash to expose conntrack and expectation ID, from Florian Westphal. 6) Prevent shift wrap in nft_chain_parse_hook(), from Dan Carpenter. 7) Fix broken ICMP ID randomization with NAT, also from Florian. 8) Remove WARN_ON in ebtables compat that is reached via syzkaller, from Florian Westphal. 9) Fix broken timestamps since fb420d5d91c1 ("tcp/fq: move back to CLOCK_MONOTONIC"), from Florian. 10) Fix logging of invalid packets in conntrack, from Andrei Vagin. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-22netfilter: ebtables: CONFIG_COMPAT: drop a bogus WARN_ONFlorian Westphal1-1/+2
It means userspace gave us a ruleset where there is some other data after the ebtables target but before the beginning of the next rule. Fixes: 81e675c227ec ("netfilter: ebtables: add CONFIG_COMPAT support") Reported-by: syzbot+659574e7bcc7f7eb4df7@syzkaller.appspotmail.com Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-19bridge: update vlan dev link state for bridge netdev changesMike Manning1-3/+47
If vlan bridge binding is enabled, then the link state of a vlan device that is an upper device of the bridge tracks the state of bridge ports that are members of that vlan. But this can only be done when the link state of the bridge is up. If it is down, then the link state of the vlan devices must also be down. This is to maintain existing behavior for when STP is enabled and there are no live ports, in which case the link state for the bridge and any vlan devices is down. Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-19bridge: update vlan dev state when port added to or deleted from vlanMike Manning1-0/+19
If vlan bridge binding is enabled, then the link state of a vlan device that is an upper device of the bridge should track the state of bridge ports that are members of that vlan. So if a bridge port becomes or stops being a member of a vlan, then update the link state of the vlan device if necessary. Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-19bridge: support binding vlan dev link state to vlan member bridge portsMike Manning3-4/+174
In the case of vlan filtering on bridges, the bridge may also have the corresponding vlan devices as upper devices. A vlan bridge binding mode is added to allow the link state of the vlan device to track only the state of the subset of bridge ports that are also members of the vlan, rather than that of all bridge ports. This mode is set with a vlan flag rather than a bridge sysfs so that the 8021q module is aware that it should not set the link state for the vlan device. If bridge vlan is configured, the bridge device event handling results in the link state for an upper device being set, if it is a vlan device with the vlan bridge binding mode enabled. This also sets a vlan_bridge_binding flag so that subsequent UP/DOWN/CHANGE events for the ports in that bridge result in a link state update of the vlan device if required. The link state of the vlan device is up if there is at least one bridge port that is a vlan member that is admin & oper up, otherwise its oper state is IF_OPER_LOWERLAYERDOWN. Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller3-11/+18
Conflict resolution of af_smc.c from Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-17net: bridge: fix netlink export of vlan_stats_per_port optionNikolay Aleksandrov1-1/+1
Since the introduction of the vlan_stats_per_port option the netlink export of it has been broken since I made a typo and used the ifla attribute instead of the bridge option to retrieve its state. Sysfs export is fine, only netlink export has been affected. Fixes: 9163a0fc1f0c0 ("net: bridge: add support for per-port vlan stats") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-17net: bridge: fix per-port af_packet socketsNikolay Aleksandrov1-9/+14
When the commit below was introduced it changed two visible things: - the skb was no longer passed through the protocol handlers with the original device - the skb was passed up the stack with skb->dev = bridge The first change broke af_packet sockets on bridge ports. For example we use them for hostapd which listens for ETH_P_PAE packets on the ports. We discussed two possible fixes: - create a clone and pass it through NF_HOOK(), act on the original skb based on the result - somehow signal to the caller from the okfn() that it was called, meaning the skb is ok to be passed, which this patch is trying to implement via returning 1 from the bridge link-local okfn() Note that we rely on the fact that NF_QUEUE/STOLEN would return 0 and drop/error would return < 0 thus the okfn() is called only when the return was 1, so we signal to the caller that it was called by preserving the return value from nf_hook(). Fixes: 8626c56c8279 ("bridge: fix potential use-after-free when hook returns QUEUE or STOLEN verdict") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-15bridge: only include nf_queue.h if neededStephen Rothwell1-0/+2
After merging the netfilter-next tree, today's linux-next build (powerpc ppc44x_defconfig) failed like this: In file included from net/bridge/br_input.c:19: include/net/netfilter/nf_queue.h:16:23: error: field 'state' has incomplete type struct nf_hook_state state; ^~~~~ Fixes: 971502d77faa ("bridge: netfilter: unroll NF_HOOK helper in bridge input path") Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-12bridge: broute: make broute a real ebtables tableFlorian Westphal4-39/+52
This makes broute a normal ebtables table, hooking at PREROUTING. The broute hook is removed. It uses skb->cb to signal to bridge rx handler that the skb should be routed instead of being bridged. This change is backwards compatible with ebtables as no userspace visible parts are changed. This means we can also remove the !ops test in ebt_register_table, it was only there for broute table sake. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-12bridge: netfilter: unroll NF_HOOK helper in bridge input pathFlorian Westphal1-4/+51
Replace NF_HOOK() based invocation of the netfilter hooks with a private copy of nf_hook_slow(). This copy has one difference: it can return the rx handler value expected by the stack, i.e. RX_HANDLER_CONSUMED or RX_HANDLER_PASS. This is needed by the next patch to invoke the ebtables "broute" table via the standard netfilter hooks rather than the custom "br_should_route_hook" indirection that is used now. When the skb is to be "brouted", we must return RX_HANDLER_PASS from the bridge rx input handler, but there is no way to indicate this via NF_HOOK(), unless perhaps by some hack such as exposing bridge_cb in the netfilter core or a percpu flag. text data bss dec filename 3369 56 0 3425 net/bridge/br_input.o.before 3458 40 0 3498 net/bridge/br_input.o.after This allows removal of the "br_should_route_hook" in the next patch. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-12bridge: reduce size of input cb to 16 bytesFlorian Westphal3-16/+16
Reduce size of br_input_skb_cb from 24 to 16 bytes by using bitfield for those values that can only be 0 or 1. igmp is the igmp type value, so it needs to be at least u8. Furthermore, the bridge currently relies on step-by-step initialization of br_input_skb_cb fields as the skb passes through the stack. Explicitly zero out the bridge input cb instead, this avoids having to review/validate that no BR_INPUT_SKB_CB(skb)->foo test can see a 'random' value from previous protocol cb. AFAICS all current fields are always set up before they are read again, so this is not a bug fix. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-11net: bridge: multicast: use rcu to access port list from ↵Nikolay Aleksandrov1-1/+3
br_multicast_start_querier br_multicast_start_querier() walks over the port list but it can be called from a timer with only multicast_lock held which doesn't protect the port list, so use RCU to walk over it. Fixes: c83b8fab06fc ("bridge: Restart queries when last querier expires") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-08rhashtable: use bit_spin_locks to protect hash bucket.NeilBrown4-4/+0
This patch changes rhashtables to use a bit_spin_lock on BIT(1) of the bucket pointer to lock the hash chain for that bucket. The benefits of a bit spin_lock are: - no need to allocate a separate array of locks. - no need to have a configuration option to guide the choice of the size of this array - locking cost is often a single test-and-set in a cache line that will have to be loaded anyway. When inserting at, or removing from, the head of the chain, the unlock is free - writing the new address in the bucket head implicitly clears the lock bit. For __rhashtable_insert_fast() we ensure this always happens when adding a new key. - even when lockings costs 2 updates (lock and unlock), they are in a cacheline that needs to be read anyway. The cost of using a bit spin_lock is a little bit of code complexity, which I think is quite manageable. Bit spin_locks are sometimes inappropriate because they are not fair - if multiple CPUs repeatedly contend of the same lock, one CPU can easily be starved. This is not a credible situation with rhashtable. Multiple CPUs may want to repeatedly add or remove objects, but they will typically do so at different buckets, so they will attempt to acquire different locks. As we have more bit-locks than we previously had spinlocks (by at least a factor of two) we can expect slightly less contention to go with the slightly better cache behavior and reduced memory consumption. To enhance type checking, a new struct is introduced to represent the pointer plus lock-bit that is stored in the bucket-table. This is "struct rhash_lock_head" and is empty. A pointer to this needs to be cast to either an unsigned lock, or a "struct rhash_head *" to be useful. Variables of this type are most often called "bkt". Previously "pprev" would sometimes point to a bucket, and sometimes a ->next pointer in an rhash_head. As these are now different types, pprev is NULL when it would have pointed to the bucket. In that case, 'blk' is used, together with correct locking protocol. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-0/+3
Minor comment merge conflict in mlx5. Staging driver has a fixup due to the skb->xmit_more changes in 'net-next', but was removed in 'net'. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-05net: bridge: mcast: remove unused br_ip_equal functionNikolay Aleksandrov1-17/+0
Since the mcast conversion to rhashtable this function has been unused, so remove it. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-05net: bridge: always clear mcast matching struct on reports and leavesNikolay Aleksandrov1-0/+3
We need to be careful and always zero the whole br_ip struct when it is used for matching since the rhashtable change. This patch fixes all the places which didn't properly clear it which in turn might've caused mismatches. Thanks for the great bug report with reproducing steps and bisection. Steps to reproduce (from the bug report): ip link add br0 type bridge mcast_querier 1 ip link set br0 up ip link add v2 type veth peer name v3 ip link set v2 master br0 ip link set v2 up ip link set v3 up ip addr add 3.0.0.2/24 dev v3 ip netns add test ip link add v1 type veth peer name v1 netns test ip link set v1 master br0 ip link set v1 up ip -n test link set v1 up ip -n test addr add 3.0.0.1/24 dev v1 # Multicast receiver ip netns exec test socat UDP4-RECVFROM:5588,ip-add-membership=224.224.224.224:3.0.0.1,fork - # Multicast sender echo hello | nc -u -s 3.0.0.2 224.224.224.224 5588 Reported-by: liam.mcbirnie@boeing.com Fixes: 19e3a9c90c53 ("net: bridge: convert multicast to generic rhashtable") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-05net: bridge: optimize backup_port fdb convergenceNikolay Aleksandrov1-1/+2
We can optimize the fdb convergence when a backup_port is present by not immediately flushing the entries of the stopped port since traffic for those entries will flow towards the backup_port. There are 2 cases specifically that benefit most: - when the stopped port comes up before the entries expire by themselves - when there's an external entry refresh and they're kept while the backup_port is operating (e.g. mlag) Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-04net: bridge: update multicast stats from maybe_deliver()Pablo Neira Ayuso1-11/+4
Simplify this code by updating bridge multicast stats from maybe_deliver(). Note that commit 6db6f0eae605 ("bridge: multicast to unicast"), in case the port flag BR_MULTICAST_TO_UNICAST is set, never updates the previous port pointer, therefore it is always going to be different from the existing port in this deduplicated list iteration. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-29net: bridge: use netif_is_bridge_port()Julian Wiedmann4-9/+7
Replace the br_port_exists() macro with its twin from netdevice.h CC: Roopa Prabhu <roopa@cumulusnetworks.com> CC: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-29ipv6: Move ipv6 stubs to a separate header fileDavid Ahern1-0/+1
The number of stubs is growing and has nothing to do with addrconf. Move the definition of the stubs to a separate header file and update users. In the move, drop the vxlan specific comment before ipv6_stub. Code move only; no functional change intended. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2-0/+3
2019-03-20net: bridge: use eth_broadcast_addr() to assign broadcast addressMao Wenan1-1/+1
This patch is to use eth_broadcast_addr() to assign broadcast address insetad of memset(). Signed-off-by: Mao Wenan <maowenan@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-18netfilter: bridge: set skb transport_header before entering NF_INET_PRE_ROUTINGXin Long2-0/+3
Since Commit 21d1196a35f5 ("ipv4: set transport header earlier"), skb->transport_header has been always set before entering INET netfilter. This patch is to set skb->transport_header for bridge before entering INET netfilter by bridge-nf-call-iptables. It also fixes an issue that sctp_error() couldn't compute a right csum due to unset skb->transport_header. Fixes: e6d8b64b34aa ("net: sctp: fix and consolidate SCTP checksumming code") Reported-by: Li Shuang <shuali@redhat.com> Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller2-96/+44
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for net-next: 1) Add .release_ops to properly unroll .select_ops, use it from nft_compat. After this change, we can remove list of extensions too to simplify this codebase. 2) Update amanda conntrack helper to support v3.4, from Florian Tham. 3) Get rid of the obsolete BUGPRINT macro in ebtables, from Florian Westphal. 4) Merge IPv4 and IPv6 masquerading infrastructure into one single module. From Florian Westphal. 5) Patchset to remove nf_nat_l3proto structure to get rid of indirections, from Florian Westphal. 6) Skip unnecessary conntrack timeout updates in case the value is still the same, also from Florian Westphal. 7) Remove unnecessary 'fall through' comments in empty switch cases, from Li RongQing. 8) Fix lookup to fixed size hashtable sets on big endian with 32-bit keys. 9) Incorrect logic to deactivate path of fixed size hashtable sets, element was being tested to self. 10) Remove nft_hash_key(), the bitmap set is always selected for 16-bit keys. 11) Use boolean whenever possible in IPVS codebase, from Andrea Claudi. 12) Enter close state in conntrack if RST matches exact sequence number, from Florian Westphal. 13) Initialize dst_cache in tunnel extension, from wenxu. 14) Pass protocol as u16 to xt_check_match and xt_check_target, from Li RongQing. 15) SCTP header is granted to be in a linear area from IPVS NAT handler, from Xin Long. 16) Don't steal packets coming from slave VRF device from the ip_sabotage_in() path, from David Ahern. 17) Fix unsafe update of basechain stats, from Li RongQing. 18) Make sure CONNTRACK_LOCKS is power of 2 to let compiler optimize modulo operation as bitwise AND, from Li RongQing. 19) Use device_attribute instead of internal definition in the IDLETIMER target, from Sami Tolvanen. 20) Merge redir, masq and IPv4/IPv6 NAT chain types, from Florian Westphal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01netfilter: bridge: Don't sabotage nf_hook calls for an l3mdev slaveDavid Ahern1-1/+2
Followup to a173f066c7cf ("netfilter: bridge: Don't sabotage nf_hook calls from an l3mdev"). Some packets (e.g., ndisc) do not have the skb device flipped to the l3mdev (e.g., VRF) device. Update ip_sabotage_in to not drop packets for slave devices too. Currently, neighbor solicitation packets for 'dev -> bridge (addr) -> vrf' setups are getting dropped. This patch enables IPv6 communications for bridges with an address that are enslaved to a VRF. Fixes: 73e20b761acf ("net: vrf: Add support for PREROUTING rules on vrf device") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01netfilter: convert the proto argument from u8 to u16Li RongQing1-3/+3
The proto in struct xt_match and struct xt_target is u16, when calling xt_check_target/match, their proto argument is u8, and will cause truncation, it is harmless to ip packet, since ip proto is u8 if a etable's match/target has proto that is u16, will cause the check failure. and convert be16 to short in bridge/netfilter/ebtables.c Signed-off-by: Zhang Yu <zhangyu31@baidu.com> Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-27net: switchdev: Replace port attr set SDO with a notificationFlorian Fainelli1-1/+7
Drop switchdev_ops.switchdev_port_attr_set. Drop the uses of this field from all clients, which were migrated to use switchdev notification in the previous patches. Add a new function switchdev_port_attr_notify() that sends the switchdev notifications SWITCHDEV_PORT_ATTR_SET and calls the blocking (process) notifier chain. We have one odd case within net/bridge/br_switchdev.c with the SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS attribute identifier that requires executing from atomic context, we deal with that one specifically. Drop __switchdev_port_attr_set() and update switchdev_port_attr_set() likewise. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-27netfilter: ebtables: remove BUGPRINT messagesFlorian Westphal1-92/+39
They are however frequently triggered by syzkaller, so remove them. ebtables userspace should never trigger any of these, so there is little value in making them pr_debug (or ratelimited). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-8/+1
Three conflicts, one of which, for marvell10g.c is non-trivial and requires some follow-up from Heiner or someone else. The issue is that Heiner converted the marvell10g driver over to use the generic c45 code as much as possible. However, in 'net' a bug fix appeared which makes sure that a new local mask (MDIO_AN_10GBT_CTRL_ADV_NBT_MASK) with value 0x01e0 is cleared. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24Revert "bridge: do not add port to router list when receives query with ↵Hangbin Liu1-8/+1
source 0.0.0.0" This reverts commit 5a2de63fd1a5 ("bridge: do not add port to router list when receives query with source 0.0.0.0") and commit 0fe5119e267f ("net: bridge: remove ipv6 zero address check in mcast queries") The reason is RFC 4541 is not a standard but suggestive. Currently we will elect 0.0.0.0 as Querier if there is no ip address configured on bridge. If we do not add the port which recives query with source 0.0.0.0 to router list, the IGMP reports will not be about to forward to Querier, IGMP data will also not be able to forward to dest. As Nikolay suggested, revert this change first and add a boolopt api to disable none-zero election in future if needed. Reported-by: Linus Lüssing <linus.luessing@c0d3.blue> Reported-by: Sebastian Gottschall <s.gottschall@newmedia-net.de> Fixes: 5a2de63fd1a5 ("bridge: do not add port to router list when receives query with source 0.0.0.0") Fixes: 0fe5119e267f ("net: bridge: remove ipv6 zero address check in mcast queries") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-22net: bridge: Stop calling switchdev_port_attr_get()Florian Fainelli1-6/+5
Now that all switchdev drivers have been converted to check the SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS flags and report flags that they do not support accordingly, we can migrate the bridge code to try to set that attribute first, check the results and then do the actual setting. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-22bridge: remove redundant check on err in br_multicast_ipv4_rcvLi RongQing1-6/+1
br_ip4_multicast_mrd_rcv only return 0 and -ENOMSG, no other negative value Signed-off-by: Li RongQing <lirongqing@baidu.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller1-5/+5
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for you net-next tree: 1) Missing NFTA_RULE_POSITION_ID netlink attribute validation, from Phil Sutter. 2) Restrict matching on tunnel metadata to rx/tx path, from wenxu. 3) Avoid indirect calls for IPV6=y, from Florian Westphal. 4) Add two indirections to prepare merger of IPV4 and IPV6 nat modules, from Florian Westphal. 5) Broken indentation in ctnetlink, from Colin Ian King. 6) Patches to use struct_size() from netfilter and IPVS, from Gustavo A. R. Silva. 7) Display kernel splat only once in case of racing to confirm conntrack from bridge plus nfqueue setups, from Chieh-Min Wang. 8) Skip checksum validation for layer 4 protocols that don't need it, patch from Alin Nastac. 9) Sparse warning due to symbol that should be static in CLUSTERIP, from Wei Yongjun. 10) Add new toggle to disable SDP payload translation when media endpoint is reachable though the same interface as the signalling peer, from Alin Nastac. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13netfilter: reject: skip csum verification for protocols that don't support itAlin Nastac1-5/+5
Some protocols have other means to verify the payload integrity (AH, ESP, SCTP) while others are incompatible with nf_ip(6)_checksum implementation because checksum is either optional or might be partial (UDPLITE, DCCP, GRE). Because nf_ip(6)_checksum was used to validate the packets, ip(6)tables REJECT rules were not capable to generate ICMP(v6) errors for the protocols mentioned above. This commit also fixes the incorrect pseudo-header protocol used for IPv4 packets that carry other transport protocols than TCP or UDP (pseudo-header used protocol 0 iso the proper value). Signed-off-by: Alin Nastac <alin.nastac@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-09bridge: use struct_size() helperGustavo A. R. Silva1-2/+1
One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; struct boo entry[]; }; size = sizeof(struct foo) + count * sizeof(struct boo); instance = alloc(size, GFP_KERNEL) Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: size = struct_size(instance, entry, count); This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-07net: Get rid of SWITCHDEV_ATTR_ID_PORT_PARENT_IDFlorian Fainelli1-11/+3
Now that we have a dedicated NDO for getting a port's parent ID, get rid of SWITCHDEV_ATTR_ID_PORT_PARENT_ID and convert all callers to use the NDO exclusively. This is a preliminary change to getting rid of switchdev_ops eventually. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-07net: Introduce ndo_get_port_parent_id()Florian Fainelli1-2/+7
In preparation for getting rid of switchdev_ops, create a dedicated NDO operation for getting the port's parent identifier. There are essentially two classes of drivers that need to implement getting the port's parent ID which are VF/PF drivers with a built-in switch, and pure switchdev drivers such as mlxsw, ocelot, dsa etc. We introduce a helper function: dev_get_port_parent_id() which supports recursion into the lower devices to obtain the first port's parent ID. Convert the bridge, core and ipv4 multicast routing code to check for such ndo_get_port_parent_id() and call the helper function when valid before falling back to switchdev_port_attr_get(). This will allow us to convert all relevant drivers in one go instead of having to implement both switchdev_port_attr_get() and ndo_get_port_parent_id() operations, then get rid of switchdev_port_attr_get(). Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03net: Fix ip_mc_{dec,inc}_group allocation contextFlorian Fainelli1-2/+2
After 4effd28c1245 ("bridge: join all-snoopers multicast address"), I started seeing the following sleep in atomic warnings: [ 26.763893] BUG: sleeping function called from invalid context at mm/slab.h:421 [ 26.771425] in_atomic(): 1, irqs_disabled(): 0, pid: 1658, name: sh [ 26.777855] INFO: lockdep is turned off. [ 26.781916] CPU: 0 PID: 1658 Comm: sh Not tainted 5.0.0-rc4 #20 [ 26.787943] Hardware name: BCM97278SV (DT) [ 26.792118] Call trace: [ 26.794645] dump_backtrace+0x0/0x170 [ 26.798391] show_stack+0x24/0x30 [ 26.801787] dump_stack+0xa4/0xe4 [ 26.805182] ___might_sleep+0x208/0x218 [ 26.809102] __might_sleep+0x78/0x88 [ 26.812762] kmem_cache_alloc_trace+0x64/0x28c [ 26.817301] igmp_group_dropped+0x150/0x230 [ 26.821573] ip_mc_dec_group+0x1b0/0x1f8 [ 26.825585] br_ip4_multicast_leave_snoopers.isra.11+0x174/0x190 [ 26.831704] br_multicast_toggle+0x78/0xcc [ 26.835887] store_bridge_parm+0xc4/0xfc [ 26.839894] multicast_snooping_store+0x3c/0x4c [ 26.844517] dev_attr_store+0x44/0x5c [ 26.848262] sysfs_kf_write+0x50/0x68 [ 26.852006] kernfs_fop_write+0x14c/0x1b4 [ 26.856102] __vfs_write+0x60/0x190 [ 26.859668] vfs_write+0xc8/0x168 [ 26.863059] ksys_write+0x70/0xc8 [ 26.866449] __arm64_sys_write+0x24/0x30 [ 26.870458] el0_svc_common+0xa0/0x11c [ 26.874291] el0_svc_handler+0x38/0x70 [ 26.878120] el0_svc+0x8/0xc while toggling the bridge's multicast_snooping attribute dynamically. Pass a gfp_t down to igmpv3_add_delrec(), introduce __igmp_group_dropped() and introduce __ip_mc_dec_group() to take a gfp_t argument. Similarly introduce ____ip_mc_inc_group() and __ip_mc_inc_group() to allow caller to specify gfp_t. IPv6 part of the patch appears fine. Fixes: 4effd28c1245 ("bridge: join all-snoopers multicast address") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-3/+6
2019-01-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller1-5/+0
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for your net-next tree: 1) Introduce a hashtable to speed up object lookups, from Florian Westphal. 2) Make direct calls to built-in extension, also from Florian. 3) Call helper before confirming the conntrack as it used to be originally, from Florian. 4) Call request_module() to autoload br_netfilter when physdev is used to relax the dependency, also from Florian. 5) Allow to insert rules at a given position ID that is internal to the batch, from Phil Sutter. 6) Several patches to replace conntrack indirections by direct calls, and to reduce modularization, from Florian. This also includes several follow up patches to deal with minor fallout from this rework. 7) Use RCU from conntrack gre helper, from Florian. 8) GRE conntrack module becomes built-in into nf_conntrack, from Florian. 9) Replace nf_ct_invert_tuplepr() by calls to nf_ct_invert_tuple(), from Florian. 10) Unify sysctl handling at the core of nf_conntrack, from Florian. 11) Provide modparam to register conntrack hooks. 12) Allow to match on the interface kind string, from wenxu. 13) Remove several exported symbols, not required anymore now after a bit of de-modulatization work has been done, from Florian. 14) Remove built-in map support in the hash extension, this can be done with the existing userspace infrastructure, from laura. 15) Remove indirection to calculate checksums in IPVS, from Matteo Croce. 16) Use call wrappers for indirection in IPVS, also from Matteo. 17) Remove superfluous __percpu parameter in nft_counter, patch from Luc Van Oostenryck. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller1-3/+6
Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains Netfilter/IPVS fixes for your net tree: 1) The nftnl mutex is now per-netns, therefore use reference counter for matches and targets to deal with concurrent updates from netns. Moreover, place extensions in a pernet list. Patches from Florian Westphal. 2) Bail out with EINVAL in case of negative timeouts via setsockopt() through ip_vs_set_timeout(), from ZhangXiaoxu. 3) Spurious EINVAL on ebtables 32bit binary with 64bit kernel, also from Florian. 4) Reset TCP option header parser in case of fingerprint mismatch, otherwise follow up overlapping fingerprint definitions including TCP options do not work, from Fernando Fernandez Mancera. 5) Compilation warning in ipt_CLUSTER with CONFIG_PROC_FS unset. From Anders Roxell. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-28netfilter: ebtables: compat: un-break 32bit setsockopt when no rules are presentFlorian Westphal1-3/+6
Unlike ip(6)tables ebtables only counts user-defined chains. The effect is that a 32bit ebtables binary on a 64bit kernel can do 'ebtables -N FOO' only after adding at least one rule, else the request fails with -EINVAL. This is a similar fix as done in 3f1e53abff84 ("netfilter: ebtables: don't attempt to allocate 0-sized compat array"). Fixes: 7d7d7e02111e9 ("netfilter: compat: reject huge allocation requests") Reported-by: Francesco Ruggeri <fruggeri@arista.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-25bridge: remove duplicated include from br_multicast.cYueHaibing1-1/+0
Remove duplicated include. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-23bridge: Snoop Multicast Router AdvertisementsLinus Lüssing1-0/+55
When multiple multicast routers are present in a broadcast domain then only one of them will be detectable via IGMP/MLD query snooping. The multicast router with the lowest IP address will become the selected and active querier while all other multicast routers will then refrain from sending queries. To detect such rather silent multicast routers, too, RFC4286 ("Multicast Router Discovery") provides a standardized protocol to detect multicast routers for multicast snooping switches. This patch implements the necessary MRD Advertisement message parsing and after successful processing adds such routers to the internal multicast router list. Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-23bridge: join all-snoopers multicast addressLinus Lüssing1-1/+71
Next to snooping IGMP/MLD queries RFC4541, section 2.1.1.a) recommends to snoop multicast router advertisements to detect multicast routers. Multicast router advertisements are sent to an "all-snoopers" multicast address. To be able to receive them reliably, we need to join this group. Otherwise other snooping switches might refrain from forwarding these advertisements to us. Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-23bridge: simplify ip_mc_check_igmp() and ipv6_mc_check_mld() callsLinus Lüssing1-29/+28
This patch refactors ip_mc_check_igmp(), ipv6_mc_check_mld() and their callers (more precisely, the Linux bridge) to not rely on the skb_trimmed parameter anymore. An skb with its tail trimmed to the IP packet length was initially introduced for the following three reasons: 1) To be able to verify the ICMPv6 checksum. 2) To be able to distinguish the version of an IGMP or MLD query. They are distinguishable only by their size. 3) To avoid parsing data for an IGMPv3 or MLDv2 report that is beyond the IP packet but still within the skb. The first case still uses a cloned and potentially trimmed skb to verfiy. However, there is no need to propagate it to the caller. For the second and third case explicit IP packet length checks were added. This hopefully makes ip_mc_check_igmp() and ipv6_mc_check_mld() easier to read and verfiy, as well as easier to use. Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller5-7/+15
Completely minor snmp doc conflict. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-19net: bridge: Mark FDB entries that were added by user as suchIdo Schimmel1-0/+5
Externally learned entries can be added by a user or by a switch driver that is notifying the bridge driver about entries that were learned in hardware. In the first case, the entries are not marked with the 'added_by_user' flag, which causes switch drivers to ignore them and not offload them. The 'added_by_user' flag can be set on externally learned FDB entries based on the 'swdev_notify' parameter in br_fdb_external_learn_add(), which effectively means if the created / updated FDB entry was added by a user or not. Fixes: 816a3bed9549 ("switchdev: Add fdb.added_by_user to switchdev notifications") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Alexander Petrovskiy <alexpe@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Cc: Roopa Prabhu <roopa@cumulusnetworks.com> Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Cc: bridge@lists.linux-foundation.org Signed-off-by: David S. Miller <davem@davemloft.net>