<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/net/ipvlan, branch linux-4.20.y</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=linux-4.20.y</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=linux-4.20.y'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2019-03-19T12:11:55+00:00</updated>
<entry>
<title>ipvlan: disallow userns cap_net_admin to change global mode/flags</title>
<updated>2019-03-19T12:11:55+00:00</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2019-02-19T23:15:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9c68aaee71ce672c4848554f377e6f80138c7d90'/>
<id>urn:sha1:9c68aaee71ce672c4848554f377e6f80138c7d90</id>
<content type='text'>
[ Upstream commit 7cc9f7003a969d359f608ebb701d42cafe75b84a ]

When running Docker with userns isolation e.g. --userns-remap="default"
and spawning up some containers with CAP_NET_ADMIN under this realm, I
noticed that link changes on ipvlan slave device inside that container
can affect all devices from this ipvlan group which are in other net
namespaces where the container should have no permission to make changes
to, such as the init netns, for example.

This effectively allows to undo ipvlan private mode and switch globally to
bridge mode where slaves can communicate directly without going through
hostns, or it allows to switch between global operation mode (l2/l3/l3s)
for everyone bound to the given ipvlan master device. libnetwork plugin
here is creating an ipvlan master and ipvlan slave in hostns and a slave
each that is moved into the container's netns upon creation event.

* In hostns:

  # ip -d a
  [...]
  8: cilium_host@bond0: &lt;BROADCAST,MULTICAST,NOARP,UP,LOWER_UP&gt; mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
     link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
     ipvlan  mode l3 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
     inet 10.41.0.1/32 scope link cilium_host
       valid_lft forever preferred_lft forever
  [...]

* Spawn container &amp; change ipvlan mode setting inside of it:

  # docker run -dt --cap-add=NET_ADMIN --network cilium-net --name client -l app=test cilium/netperf
  9fff485d69dcb5ce37c9e33ca20a11ccafc236d690105aadbfb77e4f4170879c

  # docker exec -ti client ip -d a
  [...]
  10: cilium0@if4: &lt;BROADCAST,MULTICAST,NOARP,UP,LOWER_UP&gt; mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
      link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
      ipvlan  mode l3 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
      inet 10.41.197.43/32 brd 10.41.197.43 scope global cilium0
         valid_lft forever preferred_lft forever

  # docker exec -ti client ip link change link cilium0 name cilium0 type ipvlan mode l2

  # docker exec -ti client ip -d a
  [...]
  10: cilium0@if4: &lt;BROADCAST,MULTICAST,UP,LOWER_UP&gt; mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
      link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
      ipvlan  mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
      inet 10.41.197.43/32 brd 10.41.197.43 scope global cilium0
         valid_lft forever preferred_lft forever

* In hostns (mode switched to l2):

  # ip -d a
  [...]
  8: cilium_host@bond0: &lt;BROADCAST,MULTICAST,UP,LOWER_UP&gt; mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
      link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
      ipvlan  mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
      inet 10.41.0.1/32 scope link cilium_host
         valid_lft forever preferred_lft forever
  [...]

Same l3 -&gt; l2 switch would also happen by creating another slave inside
the container's network namespace when specifying the existing cilium0
link to derive the actual (bond0) master:

  # docker exec -ti client ip link add link cilium0 name cilium1 type ipvlan mode l2

  # docker exec -ti client ip -d a
  [...]
  2: cilium1@if4: &lt;BROADCAST,MULTICAST&gt; mtu 1500 qdisc noop state DOWN group default qlen 1000
      link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
      ipvlan  mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
  10: cilium0@if4: &lt;BROADCAST,MULTICAST,UP,LOWER_UP&gt; mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
      link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
      ipvlan  mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
      inet 10.41.197.43/32 brd 10.41.197.43 scope global cilium0
         valid_lft forever preferred_lft forever

* In hostns:

  # ip -d a
  [...]
  8: cilium_host@bond0: &lt;BROADCAST,MULTICAST,UP,LOWER_UP&gt; mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
      link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
      ipvlan  mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
      inet 10.41.0.1/32 scope link cilium_host
         valid_lft forever preferred_lft forever
  [...]

One way to mitigate it is to check CAP_NET_ADMIN permissions of
the ipvlan master device's ns, and only then allow to change
mode or flags for all devices bound to it. Above two cases are
then disallowed after the patch.

Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Mahesh Bandewar &lt;maheshb@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>ipvlan, l3mdev: fix broken l3s mode wrt local routes</title>
<updated>2019-02-06T16:27:35+00:00</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2019-01-30T11:49:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=61d228f94b58e86145011c13dc8929844264e212'/>
<id>urn:sha1:61d228f94b58e86145011c13dc8929844264e212</id>
<content type='text'>
[ Upstream commit d5256083f62e2720f75bb3c5a928a0afe47d6bc3 ]

While implementing ipvlan l3 and l3s mode for kubernetes CNI plugin,
I ran into the issue that while l3 mode is working fine, l3s mode
does not have any connectivity to kube-apiserver and hence all pods
end up in Error state as well. The ipvlan master device sits on
top of a bond device and hostns traffic to kube-apiserver (also running
in hostns) is DNATed from 10.152.183.1:443 to 139.178.29.207:37573
where the latter is the address of the bond0. While in l3 mode, a
curl to https://10.152.183.1:443 or to https://139.178.29.207:37573
works fine from hostns, neither of them do in case of l3s. In the
latter only a curl to https://127.0.0.1:37573 appeared to work where
for local addresses of bond0 I saw kernel suddenly starting to emit
ARP requests to query HW address of bond0 which remained unanswered
and neighbor entries in INCOMPLETE state. These ARP requests only
happen while in l3s.

Debugging this further, I found the issue is that l3s mode is piggy-
backing on l3 master device, and in this case local routes are using
l3mdev_master_dev_rcu(dev) instead of net-&gt;loopback_dev as per commit
f5a0aab84b74 ("net: ipv4: dst for local input routes should use l3mdev
if relevant") and 5f02ce24c269 ("net: l3mdev: Allow the l3mdev to be
a loopback"). I found that reverting them back into using the
net-&gt;loopback_dev fixed ipvlan l3s connectivity and got everything
working for the CNI.

Now judging from 4fbae7d83c98 ("ipvlan: Introduce l3s mode") and the
l3mdev paper in [0] the only sole reason why ipvlan l3s is relying
on l3 master device is to get the l3mdev_ip_rcv() receive hook for
setting the dst entry of the input route without adding its own
ipvlan specific hacks into the receive path, however, any l3 domain
semantics beyond just that are breaking l3s operation. Note that
ipvlan also has the ability to dynamically switch its internal
operation from l3 to l3s for all ports via ipvlan_set_port_mode()
at runtime. In any case, l3 vs l3s soley distinguishes itself by
'de-confusing' netfilter through switching skb-&gt;dev to ipvlan slave
device late in NF_INET_LOCAL_IN before handing the skb to L4.

Minimal fix taken here is to add a IFF_L3MDEV_RX_HANDLER flag which,
if set from ipvlan setup, gets us only the wanted l3mdev_l3_rcv() hook
without any additional l3mdev semantics on top. This should also have
minimal impact since dev-&gt;priv_flags is already hot in cache. With
this set, l3s mode is working fine and I also get things like
masquerading pod traffic on the ipvlan master properly working.

  [0] https://netdevconf.org/1.2/papers/ahern-what-is-l3mdev-paper.pdf

Fixes: f5a0aab84b74 ("net: ipv4: dst for local input routes should use l3mdev if relevant")
Fixes: 5f02ce24c269 ("net: l3mdev: Allow the l3mdev to be a loopback")
Fixes: 4fbae7d83c98 ("ipvlan: Introduce l3s mode")
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: Mahesh Bandewar &lt;maheshb@google.com&gt;
Cc: David Ahern &lt;dsa@cumulusnetworks.com&gt;
Cc: Florian Westphal &lt;fw@strlen.de&gt;
Cc: Martynas Pumputis &lt;m@lambda.lt&gt;
Acked-by: David Ahern &lt;dsa@cumulusnetworks.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>ipvlan: call dev_change_flags when ipvlan mode is reset</title>
<updated>2018-07-02T11:38:09+00:00</updated>
<author>
<name>Hangbin Liu</name>
<email>liuhangbin@gmail.com</email>
</author>
<published>2018-07-01T08:21:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5dc2d3996a8b221c20dd0900bdad45031a572530'/>
<id>urn:sha1:5dc2d3996a8b221c20dd0900bdad45031a572530</id>
<content type='text'>
After we change the ipvlan mode from l3 to l2, or vice versa, we only
reset IFF_NOARP flag, but don't flush the ARP table cache, which will
cause eth-&gt;h_dest to be equal to eth-&gt;h_source in ipvlan_xmit_mode_l2().
Then the message will not come out of host.

Here is the reproducer on local host:

ip link set eth1 up
ip addr add 192.168.1.1/24 dev eth1
ip link add link eth1 ipvlan1 type ipvlan mode l3

ip netns add net1
ip link set ipvlan1 netns net1
ip netns exec net1 ip link set ipvlan1 up
ip netns exec net1 ip addr add 192.168.2.1/24 dev ipvlan1

ip route add 192.168.2.0/24 via 192.168.1.2
ping 192.168.2.2 -c 2

ip netns exec net1 ip link set ipvlan1 type ipvlan mode l2
ping 192.168.2.2 -c 2

Add the same configuration on remote host. After we set the mode to l2,
we could find that the src/dst MAC addresses are the same on eth1:

21:26:06.648565 00:b7:13:ad:d3:05 &gt; 00:b7:13:ad:d3:05, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 58356, offset 0, flags [DF], proto ICMP (1), length 84)
    192.168.2.1 &gt; 192.168.2.2: ICMP echo request, id 22686, seq 1, length 64

Fix this by calling dev_change_flags(), which will call netdevice notifier
with flag change info.

v2:
a) As pointed out by Wang Cong, check return value for dev_change_flags() when
change dev flags.
b) As suggested by Stefano and Sabrina, move flags setting before l3mdev_ops.
So we don't need to redo ipvlan_{, un}register_nf_hook() again in err path.

Reported-by: Jianlin Shi &lt;jishi@redhat.com&gt;
Reviewed-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
Reviewed-by: Sabrina Dubroca &lt;sd@queasysnail.net&gt;
Fixes: 2ad7bf3638411 ("ipvlan: Initial check-in of the IPVLAN driver.")
Signed-off-by: Hangbin Liu &lt;liuhangbin@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipvlan: fix IFLA_MTU ignored on NEWLINK</title>
<updated>2018-06-21T05:57:46+00:00</updated>
<author>
<name>Xin Long</name>
<email>lucien.xin@gmail.com</email>
</author>
<published>2018-06-21T04:56:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=30877961b1cdd6fdca783c2e8c4f0f47e95dc58c'/>
<id>urn:sha1:30877961b1cdd6fdca783c2e8c4f0f47e95dc58c</id>
<content type='text'>
Commit 296d48568042 ("ipvlan: inherit MTU from master device") adjusted
the mtu from the master device when creating a ipvlan device, but it
would also override the mtu value set in rtnl_create_link. It causes
IFLA_MTU param not to take effect.

So this patch is to not adjust the mtu if IFLA_MTU param is set when
creating a ipvlan device.

Fixes: 296d48568042 ("ipvlan: inherit MTU from master device")
Reported-by: Jianlin Shi &lt;jishi@redhat.com&gt;
Signed-off-by: Xin Long &lt;lucien.xin@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipvlan: use ETH_MAX_MTU as max mtu</title>
<updated>2018-06-19T22:53:46+00:00</updated>
<author>
<name>Xin Long</name>
<email>lucien.xin@gmail.com</email>
</author>
<published>2018-06-18T08:15:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=548feb33c598dfaf9f8e066b842441ac49b84a8a'/>
<id>urn:sha1:548feb33c598dfaf9f8e066b842441ac49b84a8a</id>
<content type='text'>
Similar to the fixes on team and bonding, this restores the ability
to set an ipvlan device's mtu to anything higher than 1500.

Fixes: 91572088e3fd ("net: use core MTU range checking in core net infra")
Signed-off-by: Xin Long &lt;lucien.xin@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipvlan: call netdevice notifier when master mac address changed</title>
<updated>2018-05-16T15:59:41+00:00</updated>
<author>
<name>Keefe Liu</name>
<email>liuqifa@huawei.com</email>
</author>
<published>2018-05-14T11:38:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ab452c3ce7bacb27ffe2fc0144aecd0c399e1e24'/>
<id>urn:sha1:ab452c3ce7bacb27ffe2fc0144aecd0c399e1e24</id>
<content type='text'>
When master device's mac has been changed, the commit
32c10bbfe914 ("ipvlan: always use the current L2 addr of the
master") makes the IPVlan devices's mac changed also, but it
doesn't do related works such as flush the IPVlan devices's
arp table.

Signed-off-by: Keefe Liu &lt;liuqifa@huawei.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: Drop pernet_operations::async</title>
<updated>2018-03-27T17:18:09+00:00</updated>
<author>
<name>Kirill Tkhai</name>
<email>ktkhai@virtuozzo.com</email>
</author>
<published>2018-03-27T15:02:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2f635ceeb22ba13c307236d69795fbb29cfa3e7c'/>
<id>urn:sha1:2f635ceeb22ba13c307236d69795fbb29cfa3e7c</id>
<content type='text'>
Synchronous pernet_operations are not allowed anymore.
All are asynchronous. So, drop the structure member.

Signed-off-by: Kirill Tkhai &lt;ktkhai@virtuozzo.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: introduce IFF_NO_RX_HANDLER</title>
<updated>2018-03-09T18:00:08+00:00</updated>
<author>
<name>Paolo Abeni</name>
<email>pabeni@redhat.com</email>
</author>
<published>2018-03-09T09:39:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f5426250a6ecfd1e9b2d5e0daf07565f664aa67d'/>
<id>urn:sha1:f5426250a6ecfd1e9b2d5e0daf07565f664aa67d</id>
<content type='text'>
Some network devices - notably ipvlan slave - are not compatible with
any kind of rx_handler. Currently the hook can be installed but any
configuration (bridge, bond, macsec, ...) is nonfunctional.

This change allocates a priv_flag bit to mark such devices and explicitly
forbid installing a rx_handler if such bit is set. The new bit is used
by ipvlan slave device.

Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipvlan: properly annotate rx_handler access</title>
<updated>2018-03-08T16:12:43+00:00</updated>
<author>
<name>Paolo Abeni</name>
<email>pabeni@redhat.com</email>
</author>
<published>2018-03-08T09:29:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ae5799dc7ec77fe0382d58cdcba9f6d1204157ae'/>
<id>urn:sha1:ae5799dc7ec77fe0382d58cdcba9f6d1204157ae</id>
<content type='text'>
The rx_handler field is rcu-protected, but I forgot to use the
proper accessor while refactoring netif_is_ipvlan_port(). Such
function only check the rx_handler value, so it is safe, but we need
to properly read rx_handler via rcu_access_pointer() to avoid sparse
warnings.

Fixes: 1ec54cb44e67 ("net: unpollute priv_flags space")
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: unpollute priv_flags space</title>
<updated>2018-03-07T17:38:24+00:00</updated>
<author>
<name>Paolo Abeni</name>
<email>pabeni@redhat.com</email>
</author>
<published>2018-03-06T09:56:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1ec54cb44e6731c3cb251bcf9251d65a4b4f6306'/>
<id>urn:sha1:1ec54cb44e6731c3cb251bcf9251d65a4b4f6306</id>
<content type='text'>
the ipvlan device driver defines and uses 2 bits inside the priv_flags
net_device field. Such bits and the related helper are used only
inside the ipvlan device driver, and the core networking does not
need to be aware of them.

This change moves netif_is_ipvlan* helper in the ipvlan driver and
re-implement them looking for ipvlan specific symbols instead of
using priv_flags.

Overall this frees two bits inside priv_flags - and move the following
ones to avoid gaps - without any intended functional change.

Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
