<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/net/vxlan.c, branch v5.9.12</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v5.9.12</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v5.9.12'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2020-08-05T19:09:10+00:00</updated>
<entry>
<title>Revert "vxlan: fix tos value before xmit"</title>
<updated>2020-08-05T19:09:10+00:00</updated>
<author>
<name>Hangbin Liu</name>
<email>liuhangbin@gmail.com</email>
</author>
<published>2020-08-05T02:41:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a0dced17ad9dc08b1b25e0065b54c97a318e6e8b'/>
<id>urn:sha1:a0dced17ad9dc08b1b25e0065b54c97a318e6e8b</id>
<content type='text'>
This reverts commit 71130f29979c7c7956b040673e6b9d5643003176.

In commit 71130f29979c ("vxlan: fix tos value before xmit") we want to
make sure the tos value are filtered by RT_TOS() based on RFC1349.

       0     1     2     3     4     5     6     7
    +-----+-----+-----+-----+-----+-----+-----+-----+
    |   PRECEDENCE    |          TOS          | MBZ |
    +-----+-----+-----+-----+-----+-----+-----+-----+

But RFC1349 has been obsoleted by RFC2474. The new DSCP field defined like

       0     1     2     3     4     5     6     7
    +-----+-----+-----+-----+-----+-----+-----+-----+
    |          DS FIELD, DSCP           | ECN FIELD |
    +-----+-----+-----+-----+-----+-----+-----+-----+

So with

IPTOS_TOS_MASK          0x1E
RT_TOS(tos)		((tos)&amp;IPTOS_TOS_MASK)

the first 3 bits DSCP info will get lost.

To take all the DSCP info in xmit, we should revert the patch and just push
all tos bits to ip_tunnel_ecn_encap(), which will handling ECN field later.

Fixes: 71130f29979c ("vxlan: fix tos value before xmit")
Signed-off-by: Hangbin Liu &lt;liuhangbin@gmail.com&gt;
Acked-by: Guillaume Nault &lt;gnault@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>vxlan: Support for PMTU discovery on directly bridged links</title>
<updated>2020-08-04T20:01:45+00:00</updated>
<author>
<name>Stefano Brivio</name>
<email>sbrivio@redhat.com</email>
</author>
<published>2020-08-04T05:53:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=fc68c99577cc66e38d11b3e29304efb83fa08d53'/>
<id>urn:sha1:fc68c99577cc66e38d11b3e29304efb83fa08d53</id>
<content type='text'>
If the interface is a bridge or Open vSwitch port, and we can't
forward a packet because it exceeds the local PMTU estimate,
trigger an ICMP or ICMPv6 reply to the sender, using the same
interface to forward it back.

If metadata collection is enabled, reverse destination and source
addresses, so that Open vSwitch is able to match this packet against
the existing, reverse flow.

v2: Use netif_is_any_bridge_port() (David Ahern)

Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tunnels: PMTU discovery support for directly bridged IP packets</title>
<updated>2020-08-04T20:01:45+00:00</updated>
<author>
<name>Stefano Brivio</name>
<email>sbrivio@redhat.com</email>
</author>
<published>2020-08-04T05:53:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4cb47a8644cc9eb8ec81190a50e79e6530d0297f'/>
<id>urn:sha1:4cb47a8644cc9eb8ec81190a50e79e6530d0297f</id>
<content type='text'>
It's currently possible to bridge Ethernet tunnels carrying IP
packets directly to external interfaces without assigning them
addresses and routes on the bridged network itself: this is the case
for UDP tunnels bridged with a standard bridge or by Open vSwitch.

PMTU discovery is currently broken with those configurations, because
the encapsulation effectively decreases the MTU of the link, and
while we are able to account for this using PMTU discovery on the
lower layer, we don't have a way to relay ICMP or ICMPv6 messages
needed by the sender, because we don't have valid routes to it.

On the other hand, as a tunnel endpoint, we can't fragment packets
as a general approach: this is for instance clearly forbidden for
VXLAN by RFC 7348, section 4.3:

   VTEPs MUST NOT fragment VXLAN packets.  Intermediate routers may
   fragment encapsulated VXLAN packets due to the larger frame size.
   The destination VTEP MAY silently discard such VXLAN fragments.

The same paragraph recommends that the MTU over the physical network
accomodates for encapsulations, but this isn't a practical option for
complex topologies, especially for typical Open vSwitch use cases.

Further, it states that:

   Other techniques like Path MTU discovery (see [RFC1191] and
   [RFC1981]) MAY be used to address this requirement as well.

Now, PMTU discovery already works for routed interfaces, we get
route exceptions created by the encapsulation device as they receive
ICMP Fragmentation Needed and ICMPv6 Packet Too Big messages, and
we already rebuild those messages with the appropriate MTU and route
them back to the sender.

Add the missing bits for bridged cases:

- checks in skb_tunnel_check_pmtu() to understand if it's appropriate
  to trigger a reply according to RFC 1122 section 3.2.2 for ICMP and
  RFC 4443 section 2.4 for ICMPv6. This function is already called by
  UDP tunnels

- a new function generating those ICMP or ICMPv6 replies. We can't
  reuse icmp_send() and icmp6_send() as we don't see the sender as a
  valid destination. This doesn't need to be generic, as we don't
  cover any other type of ICMP errors given that we only provide an
  encapsulation function to the sender

While at it, make the MTU check in skb_tunnel_check_pmtu() accurate:
we might receive GSO buffers here, and the passed headroom already
includes the inner MAC length, so we don't have to account for it
a second time (that would imply three MAC headers on the wire, but
there are just two).

This issue became visible while bridging IPv6 packets with 4500 bytes
of payload over GENEVE using IPv4 with a PMTU of 4000. Given the 50
bytes of encapsulation headroom, we would advertise MTU as 3950, and
we would reject fragmented IPv6 datagrams of 3958 bytes size on the
wire. We're exclusively dealing with network MTU here, though, so we
could get Ethernet frames up to 3964 octets in that case.

v2:
- moved skb_tunnel_check_pmtu() to ip_tunnel_core.c (David Ahern)
- split IPv4/IPv6 functions (David Ahern)

Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
Reviewed-by: David Ahern &lt;dsahern@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net</title>
<updated>2020-08-02T08:02:12+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2020-08-02T08:02:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=bd0b33b24897ba9ddad221e8ac5b6f0e38a2e004'/>
<id>urn:sha1:bd0b33b24897ba9ddad221e8ac5b6f0e38a2e004</id>
<content type='text'>
Resolved kernel/bpf/btf.c using instructions from merge commit
69138b34a7248d2396ab85c8652e20c0c39beaba

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>vxlan: fix memleak of fdb</title>
<updated>2020-08-01T18:49:18+00:00</updated>
<author>
<name>Taehee Yoo</name>
<email>ap420073@gmail.com</email>
</author>
<published>2020-08-01T07:07:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=fda2ec62cf1aa7cbee52289dc8059cd3662795da'/>
<id>urn:sha1:fda2ec62cf1aa7cbee52289dc8059cd3662795da</id>
<content type='text'>
When vxlan interface is deleted, all fdbs are deleted by vxlan_flush().
vxlan_flush() flushes fdbs but it doesn't delete fdb, which contains
all-zeros-mac because it is deleted by vxlan_uninit().
But vxlan_uninit() deletes only the fdb, which contains both all-zeros-mac
and default vni.
So, the fdb, which contains both all-zeros-mac and non-default vni
will not be deleted.

Test commands:
    ip link add vxlan0 type vxlan dstport 4789 external
    ip link set vxlan0 up
    bridge fdb add to 00:00:00:00:00:00 dst 172.0.0.1 dev vxlan0 via lo \
	    src_vni 10000 self permanent
    ip link del vxlan0

kmemleak reports as follows:
unreferenced object 0xffff9486b25ced88 (size 96):
  comm "bridge", pid 2151, jiffies 4294701712 (age 35506.901s)
  hex dump (first 32 bytes):
    02 00 00 00 ac 00 00 01 40 00 09 b1 86 94 ff ff  ........@.......
    46 02 00 00 00 00 00 00 a7 03 00 00 12 b5 6a 6b  F.............jk
  backtrace:
    [&lt;00000000c10cf651&gt;] vxlan_fdb_append.part.51+0x3c/0xf0 [vxlan]
    [&lt;000000006b31a8d9&gt;] vxlan_fdb_create+0x184/0x1a0 [vxlan]
    [&lt;0000000049399045&gt;] vxlan_fdb_update+0x12f/0x220 [vxlan]
    [&lt;0000000090b1ef00&gt;] vxlan_fdb_add+0x12a/0x1b0 [vxlan]
    [&lt;0000000056633c2c&gt;] rtnl_fdb_add+0x187/0x270
    [&lt;00000000dd5dfb6b&gt;] rtnetlink_rcv_msg+0x264/0x490
    [&lt;00000000fc44dd54&gt;] netlink_rcv_skb+0x4a/0x110
    [&lt;00000000dff433e7&gt;] netlink_unicast+0x18e/0x250
    [&lt;00000000b87fb421&gt;] netlink_sendmsg+0x2e9/0x400
    [&lt;000000002ed55153&gt;] ____sys_sendmsg+0x237/0x260
    [&lt;00000000faa51c66&gt;] ___sys_sendmsg+0x88/0xd0
    [&lt;000000006c3982f1&gt;] __sys_sendmsg+0x4e/0x80
    [&lt;00000000a8f875d2&gt;] do_syscall_64+0x56/0xe0
    [&lt;000000003610eefa&gt;] entry_SYSCALL_64_after_hwframe+0x44/0xa9
unreferenced object 0xffff9486b1c40080 (size 128):
  comm "bridge", pid 2157, jiffies 4294701754 (age 35506.866s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 f8 dc 42 b2 86 94 ff ff  ..........B.....
    6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
  backtrace:
    [&lt;00000000a2981b60&gt;] vxlan_fdb_create+0x67/0x1a0 [vxlan]
    [&lt;0000000049399045&gt;] vxlan_fdb_update+0x12f/0x220 [vxlan]
    [&lt;0000000090b1ef00&gt;] vxlan_fdb_add+0x12a/0x1b0 [vxlan]
    [&lt;0000000056633c2c&gt;] rtnl_fdb_add+0x187/0x270
    [&lt;00000000dd5dfb6b&gt;] rtnetlink_rcv_msg+0x264/0x490
    [&lt;00000000fc44dd54&gt;] netlink_rcv_skb+0x4a/0x110
    [&lt;00000000dff433e7&gt;] netlink_unicast+0x18e/0x250
    [&lt;00000000b87fb421&gt;] netlink_sendmsg+0x2e9/0x400
    [&lt;000000002ed55153&gt;] ____sys_sendmsg+0x237/0x260
    [&lt;00000000faa51c66&gt;] ___sys_sendmsg+0x88/0xd0
    [&lt;000000006c3982f1&gt;] __sys_sendmsg+0x4e/0x80
    [&lt;00000000a8f875d2&gt;] do_syscall_64+0x56/0xe0
    [&lt;000000003610eefa&gt;] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 3ad7a4b141eb ("vxlan: support fdb and learning in COLLECT_METADATA mode")
Signed-off-by: Taehee Yoo &lt;ap420073@gmail.com&gt;
Acked-by: Roopa Prabhu &lt;roopa@cumulusnetworks.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>vxlan: Ensure FDB dump is performed under RCU</title>
<updated>2020-07-29T19:04:54+00:00</updated>
<author>
<name>Ido Schimmel</name>
<email>idosch@mellanox.com</email>
</author>
<published>2020-07-29T08:34:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b5141915b5aec3b29a63db869229e3741ebce258'/>
<id>urn:sha1:b5141915b5aec3b29a63db869229e3741ebce258</id>
<content type='text'>
The commit cited below removed the RCU read-side critical section from
rtnl_fdb_dump() which means that the ndo_fdb_dump() callback is invoked
without RCU protection.

This results in the following warning [1] in the VXLAN driver, which
relied on the callback being invoked from an RCU read-side critical
section.

Fix this by calling rcu_read_lock() in the VXLAN driver, as already done
in the bridge driver.

[1]
WARNING: suspicious RCU usage
5.8.0-rc4-custom-01521-g481007553ce6 #29 Not tainted
-----------------------------
drivers/net/vxlan.c:1379 RCU-list traversed in non-reader section!!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
1 lock held by bridge/166:
 #0: ffffffff85a27850 (rtnl_mutex){+.+.}-{3:3}, at: netlink_dump+0xea/0x1090

stack backtrace:
CPU: 1 PID: 166 Comm: bridge Not tainted 5.8.0-rc4-custom-01521-g481007553ce6 #29
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
Call Trace:
 dump_stack+0x100/0x184
 lockdep_rcu_suspicious+0x153/0x15d
 vxlan_fdb_dump+0x51e/0x6d0
 rtnl_fdb_dump+0x4dc/0xad0
 netlink_dump+0x540/0x1090
 __netlink_dump_start+0x695/0x950
 rtnetlink_rcv_msg+0x802/0xbd0
 netlink_rcv_skb+0x17a/0x480
 rtnetlink_rcv+0x22/0x30
 netlink_unicast+0x5ae/0x890
 netlink_sendmsg+0x98a/0xf40
 __sys_sendto+0x279/0x3b0
 __x64_sys_sendto+0xe6/0x1a0
 do_syscall_64+0x54/0xa0
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fe14fa2ade0
Code: Bad RIP value.
RSP: 002b:00007fff75bb5b88 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005614b1ba0020 RCX: 00007fe14fa2ade0
RDX: 000000000000011c RSI: 00007fff75bb5b90 RDI: 0000000000000003
RBP: 00007fff75bb5b90 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00005614b1b89160
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000

Fixes: 5e6d24358799 ("bridge: netlink dump interface at par with brctl")
Signed-off-by: Ido Schimmel &lt;idosch@mellanox.com&gt;
Reviewed-by: Jiri Pirko &lt;jiri@mellanox.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>udp_tunnel: add central NIC RX port offload infrastructure</title>
<updated>2020-07-10T20:54:00+00:00</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2020-07-10T00:42:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=cc4e3835eff474aa274d6e1d18f69d9d296d3b76'/>
<id>urn:sha1:cc4e3835eff474aa274d6e1d18f69d9d296d3b76</id>
<content type='text'>
Cater to devices which:
 (a) may want to sleep in the callbacks;
 (b) only have IPv4 support;
 (c) need all the programming to happen while the netdev is up.

Drivers attach UDP tunnel offload info struct to their netdevs,
where they declare how many UDP ports of various tunnel types
they support. Core takes care of tracking which ports to offload.

Use a fixed-size array since this matches what almost all drivers
do, and avoids a complexity and uncertainty around memory allocations
in an atomic context.

Make sure that tunnel drivers don't try to replay the ports when
new NIC netdev is registered. Automatic replays would mess up
reference counting, and will be removed completely once all drivers
are converted.

v4:
 - use a #define NULL to avoid build issues with CONFIG_INET=n.

Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>vxlan: fix last fdb index during dump of fdb with nhid</title>
<updated>2020-06-25T23:12:34+00:00</updated>
<author>
<name>Roopa Prabhu</name>
<email>roopa@cumulusnetworks.com</email>
</author>
<published>2020-06-24T21:02:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b18e9834f7b248b41e8179a039ec80803bb3d67a'/>
<id>urn:sha1:b18e9834f7b248b41e8179a039ec80803bb3d67a</id>
<content type='text'>
This patch fixes last saved fdb index in fdb dump handler when
handling fdb's with nhid.

Fixes: 1274e1cc4226 ("vxlan: ecmp support for mac fdb entries")
Signed-off-by: Roopa Prabhu &lt;roopa@cumulusnetworks.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>vxlan: Remove access to nexthop group struct</title>
<updated>2020-06-10T20:20:20+00:00</updated>
<author>
<name>David Ahern</name>
<email>dsahern@kernel.org</email>
</author>
<published>2020-06-09T23:27:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=50cb8769f2c1c657a470bda192b79ff679d0ecfc'/>
<id>urn:sha1:50cb8769f2c1c657a470bda192b79ff679d0ecfc</id>
<content type='text'>
vxlan driver should be using helpers to access nexthop struct
internals. Remove open check if whether nexthop is multipath in
favor of the existing nexthop_is_multipath helper. Add a new
helper, nexthop_has_v4, to cover the need to check has_v4 in
a group.

Fixes: 1274e1cc4226 ("vxlan: ecmp support for mac fdb entries")
Cc: Roopa Prabhu &lt;roopa@cumulusnetworks.com&gt;
Signed-off-by: David Ahern &lt;dsahern@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>nexthop: Fix fdb labeling for groups</title>
<updated>2020-06-10T20:18:40+00:00</updated>
<author>
<name>David Ahern</name>
<email>dsahern@kernel.org</email>
</author>
<published>2020-06-09T02:54:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ce9ac056d9cd15630dfca352ff6d3051ba3ba8f6'/>
<id>urn:sha1:ce9ac056d9cd15630dfca352ff6d3051ba3ba8f6</id>
<content type='text'>
fdb nexthops are marked with a flag. For standalone nexthops, a flag was
added to the nh_info struct. For groups that flag was added to struct
nexthop when it should have been added to the group information. Fix
by removing the flag from the nexthop struct and adding a flag to nh_group
that mirrors nh_info and is really only a caching of the individual types.
Add a helper, nexthop_is_fdb, for use by the vxlan code and fixup the
internal code to use the flag from either nh_info or nh_group.

v2
- propagate fdb_nh in remove_nh_grp_entry

Fixes: 38428d68719c ("nexthop: support for fdb ecmp nexthops")
Cc: Roopa Prabhu &lt;roopa@cumulusnetworks.com&gt;
Signed-off-by: David Ahern &lt;dsahern@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
