<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/net/core/rtnetlink.c, branch linux-5.11.y</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=linux-5.11.y</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=linux-5.11.y'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2021-03-07T11:35:48+00:00</updated>
<entry>
<title>net: fix dev_ifsioc_locked() race condition</title>
<updated>2021-03-07T11:35:48+00:00</updated>
<author>
<name>Cong Wang</name>
<email>cong.wang@bytedance.com</email>
</author>
<published>2021-02-11T19:34:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4d0ae760c02c98fc78b78d3a0509896bc648ad1c'/>
<id>urn:sha1:4d0ae760c02c98fc78b78d3a0509896bc648ad1c</id>
<content type='text'>
commit 3b23a32a63219f51a5298bc55a65ecee866e79d0 upstream.

dev_ifsioc_locked() is called with only RCU read lock, so when
there is a parallel writer changing the mac address, it could
get a partially updated mac address, as shown below:

Thread 1			Thread 2
// eth_commit_mac_addr_change()
memcpy(dev-&gt;dev_addr, addr-&gt;sa_data, ETH_ALEN);
				// dev_ifsioc_locked()
				memcpy(ifr-&gt;ifr_hwaddr.sa_data,
					dev-&gt;dev_addr,...);

Close this race condition by guarding them with a RW semaphore,
like netdev_get_name(). We can not use seqlock here as it does not
allow blocking. The writers already take RTNL anyway, so this does
not affect the slow path. To avoid bothering existing
dev_set_mac_address() callers in drivers, introduce a new wrapper
just for user-facing callers on ioctl and rtnetlink paths.

Note, bonding also changes slave mac addresses but that requires
a separate patch due to the complexity of bonding code.

Fixes: 3710becf8a58 ("net: RCU locking for simple ioctl()")
Reported-by: "Gong, Sishuai" &lt;sishuai@purdue.edu&gt;
Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Cc: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Cong Wang &lt;cong.wang@bytedance.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>net: make free_netdev() more lenient with unregistering devices</title>
<updated>2021-01-09T03:27:41+00:00</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2021-01-06T18:40:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c269a24ce057abfc31130960e96ab197ef6ab196'/>
<id>urn:sha1:c269a24ce057abfc31130960e96ab197ef6ab196</id>
<content type='text'>
There are two flavors of handling netdev registration:
 - ones called without holding rtnl_lock: register_netdev() and
   unregister_netdev(); and
 - those called with rtnl_lock held: register_netdevice() and
   unregister_netdevice().

While the semantics of the former are pretty clear, the same can't
be said about the latter. The netdev_todo mechanism is utilized to
perform some of the device unregistering tasks and it hooks into
rtnl_unlock() so the locked variants can't actually finish the work.
In general free_netdev() does not mix well with locked calls. Most
drivers operating under rtnl_lock set dev-&gt;needs_free_netdev to true
and expect core to make the free_netdev() call some time later.

The part where this becomes most problematic is error paths. There is
no way to unwind the state cleanly after a call to register_netdevice(),
since unreg can't be performed fully without dropping locks.

Make free_netdev() more lenient, and defer the freeing if device
is being unregistered. This allows error paths to simply call
free_netdev() both after register_netdevice() failed, and after
a call to unregister_netdevice() but before dropping rtnl_lock.

Simplify the error paths which are currently doing gymnastics
around free_netdev() handling.

Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>docs: net: explain struct net_device lifetime</title>
<updated>2021-01-09T03:27:41+00:00</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2021-01-06T18:40:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2b446e650b418f9a9e75f99852e2f2560cabfa17'/>
<id>urn:sha1:2b446e650b418f9a9e75f99852e2f2560cabfa17</id>
<content type='text'>
Explain the two basic flows of struct net_device's operation.

Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>rtnetlink: RCU-annotate both dimensions of rtnl_msg_handlers</title>
<updated>2020-12-10T21:35:59+00:00</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2020-12-10T02:16:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=51e13685bd93654e0e9b2559c8e103d6545ddf95'/>
<id>urn:sha1:51e13685bd93654e0e9b2559c8e103d6545ddf95</id>
<content type='text'>
We use rcu_assign_pointer to assign both the table and the entries,
but the entries are not marked as __rcu. This generates sparse
warnings.

Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>treewide: rename nla_strlcpy to nla_strscpy.</title>
<updated>2020-11-16T16:08:54+00:00</updated>
<author>
<name>Francis Laniel</name>
<email>laniel_francis@privacyrequired.com</email>
</author>
<published>2020-11-15T17:08:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=872f690341948b502c93318f806d821c56772c42'/>
<id>urn:sha1:872f690341948b502c93318f806d821c56772c42</id>
<content type='text'>
Calls to nla_strlcpy are now replaced by calls to nla_strscpy which is the new
name of this function.

Signed-off-by: Francis Laniel &lt;laniel_francis@privacyrequired.com&gt;
Reviewed-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>rtnetlink: fix data overflow in rtnl_calcit()</title>
<updated>2020-10-22T01:24:08+00:00</updated>
<author>
<name>Di Zhu</name>
<email>zhudi21@huawei.com</email>
</author>
<published>2020-10-21T02:00:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ebfe3c5183733f784264450a41646a482f964e5e'/>
<id>urn:sha1:ebfe3c5183733f784264450a41646a482f964e5e</id>
<content type='text'>
"ip addr show" command execute error when we have a physical
network card with a large number of VFs

The return value of if_nlmsg_size() in rtnl_calcit() will exceed
range of u16 data type when any network cards has a larger number of
VFs. rtnl_vfinfo_size() will significant increase needed dump size when
the value of num_vfs is larger.

Eventually we get a wrong value of min_ifinfo_dump_size because of overflow
which decides the memory size needed by netlink dump and netlink_dump()
will return -EMSGSIZE because of not enough memory was allocated.

So fix it by promoting  min_dump_alloc data type to u32 to
avoid whole netlink message size overflow and it's also align
with the data type of struct netlink_callback{}.min_dump_alloc
which is assigned by return value of rtnl_calcit()

Signed-off-by: Di Zhu &lt;zhudi21@huawei.com&gt;
Link: https://lore.kernel.org/r/20201021020053.1401-1-zhudi21@huawei.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next</title>
<updated>2020-08-04T01:27:40+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2020-08-04T01:27:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2e7199bd773bff3220184d071ed9c9cd34950e51'/>
<id>urn:sha1:2e7199bd773bff3220184d071ed9c9cd34950e51</id>
<content type='text'>
Daniel Borkmann says:

====================
pull-request: bpf-next 2020-08-04

The following pull-request contains BPF updates for your *net-next* tree.

We've added 73 non-merge commits during the last 9 day(s) which contain
a total of 135 files changed, 4603 insertions(+), 1013 deletions(-).

The main changes are:

1) Implement bpf_link support for XDP. Also add LINK_DETACH operation for the BPF
   syscall allowing processes with BPF link FD to force-detach, from Andrii Nakryiko.

2) Add BPF iterator for map elements and to iterate all BPF programs for efficient
   in-kernel inspection, from Yonghong Song and Alexei Starovoitov.

3) Separate bpf_get_{stack,stackid}() helpers for perf events in BPF to avoid
   unwinder errors, from Song Liu.

4) Allow cgroup local storage map to be shared between programs on the same
   cgroup. Also extend BPF selftests with coverage, from YiFei Zhu.

5) Add BPF exception tables to ARM64 JIT in order to be able to JIT BPF_PROBE_MEM
   load instructions, from Jean-Philippe Brucker.

6) Follow-up fixes on BPF socket lookup in combination with reuseport group
   handling. Also add related BPF selftests, from Jakub Sitnicki.

7) Allow to use socket storage in BPF_PROG_TYPE_CGROUP_SOCK-typed programs for
   socket create/release as well as bind functions, from Stanislav Fomichev.

8) Fix an info leak in xsk_getsockopt() when retrieving XDP stats via old struct
   xdp_statistics, from Peilin Ye.

9) Fix PT_REGS_RC{,_CORE}() macros in libbpf for MIPS arch, from Jerry Crunchtime.

10) Extend BPF kernel test infra with skb-&gt;family and skb-&gt;{local,remote}_ip{4,6}
    fields and allow user space to specify skb-&gt;dev via ifindex, from Dmitry Yakunin.

11) Fix a bpftool segfault due to missing program type name and make it more robust
    to prevent them in future gaps, from Quentin Monnet.

12) Consolidate cgroup helper functions across selftests and fix a v6 localhost
    resolver issue, from John Fastabend.
====================

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>rtnetlink: add support for protodown reason</title>
<updated>2020-08-01T01:49:16+00:00</updated>
<author>
<name>Roopa Prabhu</name>
<email>roopa@cumulusnetworks.com</email>
</author>
<published>2020-08-01T00:34:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=829eb208e80d6db95c0201cb8fa00c2f9ad87faf'/>
<id>urn:sha1:829eb208e80d6db95c0201cb8fa00c2f9ad87faf</id>
<content type='text'>
netdev protodown is a mechanism that allows protocols to
hold an interface down. It was initially introduced in
the kernel to hold links down by a multihoming protocol.
There was also an attempt to introduce protodown
reason at the time but was rejected. protodown and protodown reason
is supported by almost every switching and routing platform.
It was ok for a while to live without a protodown reason.
But, its become more critical now given more than
one protocol may need to keep a link down on a system
at the same time. eg: vrrp peer node, port security,
multihoming protocol. Its common for Network operators and
protocol developers to look for such a reason on a networking
box (Its also known as errDisable by most networking operators)

This patch adds support for link protodown reason
attribute. There are two ways to maintain protodown
reasons.
(a) enumerate every possible reason code in kernel
    - A protocol developer has to make a request and
      have that appear in a certain kernel version
(b) provide the bits in the kernel, and allow user-space
(sysadmin or NOS distributions) to manage the bit-to-reasonname
map.
	- This makes extending reason codes easier (kind of like
      the iproute2 table to vrf-name map /etc/iproute2/rt_tables.d/)

This patch takes approach (b).

a few things about the patch:
- It treats the protodown reason bits as counter to indicate
active protodown users
- Since protodown attribute is already an exposed UAPI,
the reason is not enforced on a protodown set. Its a no-op
if not used.
the patch follows the below algorithm:
  - presence of reason bits set indicates protodown
    is in use
  - user can set protodown and protodown reason in a
    single or multiple setlink operations
  - setlink operation to clear protodown, will return -EBUSY
    if there are active protodown reason bits
  - reason is not included in link dumps if not used

example with patched iproute2:
$cat /etc/iproute2/protodown_reasons.d/r.conf
0 mlag
1 evpn
2 vrrp
3 psecurity

$ip link set dev vxlan0 protodown on protodown_reason vrrp on
$ip link set dev vxlan0 protodown_reason mlag on
$ip link show
14: vxlan0: &lt;BROADCAST,MULTICAST&gt; mtu 1500 qdisc noop state DOWN mode
DEFAULT group default qlen 1000
    link/ether f6:06:be:17:91:e7 brd ff:ff:ff:ff:ff:ff protodown on &lt;mlag,vrrp&gt;

$ip link set dev vxlan0 protodown_reason mlag off
$ip link set dev vxlan0 protodown off protodown_reason vrrp off

Signed-off-by: Roopa Prabhu &lt;roopa@cumulusnetworks.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf, xdp: Maintain info on attached XDP BPF programs in net_device</title>
<updated>2020-07-26T03:37:02+00:00</updated>
<author>
<name>Andrii Nakryiko</name>
<email>andriin@fb.com</email>
</author>
<published>2020-07-22T06:45:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7f0a838254bdd9114b978ef2541a6ce330307e9e'/>
<id>urn:sha1:7f0a838254bdd9114b978ef2541a6ce330307e9e</id>
<content type='text'>
Instead of delegating to drivers, maintain information about which BPF
programs are attached in which XDP modes (generic/skb, driver, or hardware)
locally in net_device. This effectively obsoletes XDP_QUERY_PROG command.

Such re-organization simplifies existing code already. But it also allows to
further add bpf_link-based XDP attachments without drivers having to know
about any of this at all, which seems like a good setup.
XDP_SETUP_PROG/XDP_SETUP_PROG_HW are just low-level commands to driver to
install/uninstall active BPF program. All the higher-level concerns about
prog/link interaction will be contained within generic driver-agnostic logic.

All the XDP_QUERY_PROG calls to driver in dev_xdp_uninstall() were removed.
It's not clear for me why dev_xdp_uninstall() were passing previous prog_flags
when resetting installed programs. That seems unnecessary, plus most drivers
don't populate prog_flags anyways. Having XDP_SETUP_PROG vs XDP_SETUP_PROG_HW
should be enough of an indicator of what is required of driver to correctly
reset active BPF program. dev_xdp_uninstall() is also generalized as an
iteration over all three supported mode.

Signed-off-by: Andrii Nakryiko &lt;andriin@fb.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20200722064603.3350758-3-andriin@fb.com
</content>
</entry>
<entry>
<title>rtnetlink: Fix memory(net_device) leak when -&gt;newlink fails</title>
<updated>2020-07-17T19:33:18+00:00</updated>
<author>
<name>Weilong Chen</name>
<email>chenweilong@huawei.com</email>
</author>
<published>2020-07-15T12:58:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=cebb69754f37d68e1355a5e726fdac317bcda302'/>
<id>urn:sha1:cebb69754f37d68e1355a5e726fdac317bcda302</id>
<content type='text'>
When vlan_newlink call register_vlan_dev fails, it might return error
with dev-&gt;reg_state = NETREG_UNREGISTERED. The rtnl_newlink should
free the memory. But currently rtnl_newlink only free the memory which
state is NETREG_UNINITIALIZED.

BUG: memory leak
unreferenced object 0xffff8881051de000 (size 4096):
  comm "syz-executor139", pid 560, jiffies 4294745346 (age 32.445s)
  hex dump (first 32 bytes):
    76 6c 61 6e 32 00 00 00 00 00 00 00 00 00 00 00  vlan2...........
    00 45 28 03 81 88 ff ff 00 00 00 00 00 00 00 00  .E(.............
  backtrace:
    [&lt;0000000047527e31&gt;] kmalloc_node include/linux/slab.h:578 [inline]
    [&lt;0000000047527e31&gt;] kvmalloc_node+0x33/0xd0 mm/util.c:574
    [&lt;000000002b59e3bc&gt;] kvmalloc include/linux/mm.h:753 [inline]
    [&lt;000000002b59e3bc&gt;] kvzalloc include/linux/mm.h:761 [inline]
    [&lt;000000002b59e3bc&gt;] alloc_netdev_mqs+0x83/0xd90 net/core/dev.c:9929
    [&lt;000000006076752a&gt;] rtnl_create_link+0x2c0/0xa20 net/core/rtnetlink.c:3067
    [&lt;00000000572b3be5&gt;] __rtnl_newlink+0xc9c/0x1330 net/core/rtnetlink.c:3329
    [&lt;00000000e84ea553&gt;] rtnl_newlink+0x66/0x90 net/core/rtnetlink.c:3397
    [&lt;0000000052c7c0a9&gt;] rtnetlink_rcv_msg+0x540/0x990 net/core/rtnetlink.c:5460
    [&lt;000000004b5cb379&gt;] netlink_rcv_skb+0x12b/0x3a0 net/netlink/af_netlink.c:2469
    [&lt;00000000c71c20d3&gt;] netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
    [&lt;00000000c71c20d3&gt;] netlink_unicast+0x4c6/0x690 net/netlink/af_netlink.c:1329
    [&lt;00000000cca72fa9&gt;] netlink_sendmsg+0x735/0xcc0 net/netlink/af_netlink.c:1918
    [&lt;000000009221ebf7&gt;] sock_sendmsg_nosec net/socket.c:652 [inline]
    [&lt;000000009221ebf7&gt;] sock_sendmsg+0x109/0x140 net/socket.c:672
    [&lt;000000001c30ffe4&gt;] ____sys_sendmsg+0x5f5/0x780 net/socket.c:2352
    [&lt;00000000b71ca6f3&gt;] ___sys_sendmsg+0x11d/0x1a0 net/socket.c:2406
    [&lt;0000000007297384&gt;] __sys_sendmsg+0xeb/0x1b0 net/socket.c:2439
    [&lt;000000000eb29b11&gt;] do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:359
    [&lt;000000006839b4d0&gt;] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: cb626bf566eb ("net-sysfs: Fix reference count leak")
Reported-by: Hulk Robot &lt;hulkci@huawei.com&gt;
Signed-off-by: Weilong Chen &lt;chenweilong@huawei.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
