<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/net/core/dev.c, branch linux-5.11.y</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=linux-5.11.y</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=linux-5.11.y'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2021-05-14T08:50:13+00:00</updated>
<entry>
<title>gro: fix napi_gro_frags() Fast GRO breakage due to IP alignment check</title>
<updated>2021-05-14T08:50:13+00:00</updated>
<author>
<name>Alexander Lobakin</name>
<email>alobakin@pm.me</email>
</author>
<published>2021-04-19T12:53:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3fda99f1b933c2b053642ebeadf39a118f6e2d2a'/>
<id>urn:sha1:3fda99f1b933c2b053642ebeadf39a118f6e2d2a</id>
<content type='text'>
[ Upstream commit 7ad18ff6449cbd6beb26b53128ddf56d2685aa93 ]

Commit 38ec4944b593 ("gro: ensure frag0 meets IP header alignment")
did the right thing, but missed the fact that napi_gro_frags() logics
calls for skb_gro_reset_offset() *before* pulling Ethernet header
to the skb linear space.
That said, the introduced check for frag0 address being aligned to 4
always fails for it as Ethernet header is obviously 14 bytes long,
and in case with NET_IP_ALIGN its start is not aligned to 4.

Fix this by adding @nhoff argument to skb_gro_reset_offset() which
tells if an IP header is placed right at the start of frag0 or not.
This restores Fast GRO for napi_gro_frags() that became very slow
after the mentioned commit, and preserves the introduced check to
avoid silent unaligned accesses.

From v1 [0]:
 - inline tiny skb_gro_reset_offset() to let the code be optimized
   more efficively (esp. for the !NET_IP_ALIGN case) (Eric);
 - pull in Reviewed-by from Eric.

[0] https://lore.kernel.org/netdev/20210418114200.5839-1-alobakin@pm.me

Fixes: 38ec4944b593 ("gro: ensure frag0 meets IP header alignment")
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Alexander Lobakin &lt;alobakin@pm.me&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>gro: ensure frag0 meets IP header alignment</title>
<updated>2021-04-21T11:13:27+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2021-04-13T12:41:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=069013bf21528514f537eca0ec70529db4720132'/>
<id>urn:sha1:069013bf21528514f537eca0ec70529db4720132</id>
<content type='text'>
commit 38ec4944b593fd90c5ef42aaaa53e66ae5769d04 upstream.

After commit 0f6925b3e8da ("virtio_net: Do not pull payload in skb-&gt;head")
Guenter Roeck reported one failure in his tests using sh architecture.

After much debugging, we have been able to spot silent unaligned accesses
in inet_gro_receive()

The issue at hand is that upper networking stacks assume their header
is word-aligned. Low level drivers are supposed to reserve NET_IP_ALIGN
bytes before the Ethernet header to make that happen.

This patch hardens skb_gro_reset_offset() to not allow frag0 fast-path
if the fragment is not properly aligned.

Some arches like x86, arm64 and powerpc do not care and define NET_IP_ALIGN
as 0, this extra check will be a NOP for them.

Note that if frag0 is not used, GRO will call pskb_may_pull()
as many times as needed to pull network and transport headers.

Fixes: 0f6925b3e8da ("virtio_net: Do not pull payload in skb-&gt;head")
Fixes: 78a478d0efd9 ("gro: Inline skb_gro_header and cache frag0 virtual address")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reported-by: Guenter Roeck &lt;linux@roeck-us.net&gt;
Cc: Xuan Zhuo &lt;xuanzhuo@linux.alibaba.com&gt;
Cc: "Michael S. Tsirkin" &lt;mst@redhat.com&gt;
Cc: Jason Wang &lt;jasowang@redhat.com&gt;
Acked-by: Michael S. Tsirkin &lt;mst@redhat.com&gt;
Tested-by: Guenter Roeck &lt;linux@roeck-us.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>can: dev: Move device back to init netns on owning netns delete</title>
<updated>2021-03-30T12:30:30+00:00</updated>
<author>
<name>Martin Willi</name>
<email>martin@strongswan.org</email>
</author>
<published>2021-03-02T12:24:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0a675f66e16bfe03fdd04b82dbd2b4c3a4cb80d2'/>
<id>urn:sha1:0a675f66e16bfe03fdd04b82dbd2b4c3a4cb80d2</id>
<content type='text'>
commit 3a5ca857079ea022e0b1b17fc154f7ad7dbc150f upstream.

When a non-initial netns is destroyed, the usual policy is to delete
all virtual network interfaces contained, but move physical interfaces
back to the initial netns. This keeps the physical interface visible
on the system.

CAN devices are somewhat special, as they define rtnl_link_ops even
if they are physical devices. If a CAN interface is moved into a
non-initial netns, destroying that netns lets the interface vanish
instead of moving it back to the initial netns. default_device_exit()
skips CAN interfaces due to having rtnl_link_ops set. Reproducer:

  ip netns add foo
  ip link set can0 netns foo
  ip netns delete foo

WARNING: CPU: 1 PID: 84 at net/core/dev.c:11030 ops_exit_list+0x38/0x60
CPU: 1 PID: 84 Comm: kworker/u4:2 Not tainted 5.10.19 #1
Workqueue: netns cleanup_net
[&lt;c010e700&gt;] (unwind_backtrace) from [&lt;c010a1d8&gt;] (show_stack+0x10/0x14)
[&lt;c010a1d8&gt;] (show_stack) from [&lt;c086dc10&gt;] (dump_stack+0x94/0xa8)
[&lt;c086dc10&gt;] (dump_stack) from [&lt;c086b938&gt;] (__warn+0xb8/0x114)
[&lt;c086b938&gt;] (__warn) from [&lt;c086ba10&gt;] (warn_slowpath_fmt+0x7c/0xac)
[&lt;c086ba10&gt;] (warn_slowpath_fmt) from [&lt;c0629f20&gt;] (ops_exit_list+0x38/0x60)
[&lt;c0629f20&gt;] (ops_exit_list) from [&lt;c062a5c4&gt;] (cleanup_net+0x230/0x380)
[&lt;c062a5c4&gt;] (cleanup_net) from [&lt;c0142c20&gt;] (process_one_work+0x1d8/0x438)
[&lt;c0142c20&gt;] (process_one_work) from [&lt;c0142ee4&gt;] (worker_thread+0x64/0x5a8)
[&lt;c0142ee4&gt;] (worker_thread) from [&lt;c0148a98&gt;] (kthread+0x148/0x14c)
[&lt;c0148a98&gt;] (kthread) from [&lt;c0100148&gt;] (ret_from_fork+0x14/0x2c)

To properly restore physical CAN devices to the initial netns on owning
netns exit, introduce a flag on rtnl_link_ops that can be set by drivers.
For CAN devices setting this flag, default_device_exit() considers them
non-virtual, applying the usual namespace move.

The issue was introduced in the commit mentioned below, as at that time
CAN devices did not have a dellink() operation.

Fixes: e008b5fc8dc7 ("net: Simplfy default_device_exit and improve batching.")
Link: https://lore.kernel.org/r/20210302122423.872326-1-martin@strongswan.org
Signed-off-by: Martin Willi &lt;martin@strongswan.org&gt;
Signed-off-by: Marc Kleine-Budde &lt;mkl@pengutronix.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>net: check all name nodes in __dev_alloc_name</title>
<updated>2021-03-30T12:30:23+00:00</updated>
<author>
<name>Jiri Bohac</name>
<email>jbohac@suse.cz</email>
</author>
<published>2021-03-18T03:42:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a39e9373f6a6ccbca988cc8a74c96762108a2feb'/>
<id>urn:sha1:a39e9373f6a6ccbca988cc8a74c96762108a2feb</id>
<content type='text'>
[ Upstream commit 6c015a2256801597fadcbc11d287774c9c512fa5 ]

__dev_alloc_name(), when supplied with a name containing '%d',
will search for the first available device number to generate a
unique device name.

Since commit ff92741270bf8b6e78aa885f166b68c7a67ab13a ("net:
introduce name_node struct to be used in hashlist") network
devices may have alternate names.  __dev_alloc_name() does take
these alternate names into account, possibly generating a name
that is already taken and failing with -ENFILE as a result.

This demonstrates the bug:

    # rmmod dummy 2&gt;/dev/null
    # ip link property add dev lo altname dummy0
    # modprobe dummy numdummies=1
    modprobe: ERROR: could not insert 'dummy': Too many open files in system

Instead of creating a device named dummy1, modprobe fails.

Fix this by checking all the names in the d-&gt;name_node list, not just d-&gt;name.

Signed-off-by: Jiri Bohac &lt;jbohac@suse.cz&gt;
Fixes: ff92741270bf ("net: introduce name_node struct to be used in hashlist")
Reviewed-by: Jiri Pirko &lt;jiri@nvidia.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: fix dev_ifsioc_locked() race condition</title>
<updated>2021-03-07T11:35:48+00:00</updated>
<author>
<name>Cong Wang</name>
<email>cong.wang@bytedance.com</email>
</author>
<published>2021-02-11T19:34:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4d0ae760c02c98fc78b78d3a0509896bc648ad1c'/>
<id>urn:sha1:4d0ae760c02c98fc78b78d3a0509896bc648ad1c</id>
<content type='text'>
commit 3b23a32a63219f51a5298bc55a65ecee866e79d0 upstream.

dev_ifsioc_locked() is called with only RCU read lock, so when
there is a parallel writer changing the mac address, it could
get a partially updated mac address, as shown below:

Thread 1			Thread 2
// eth_commit_mac_addr_change()
memcpy(dev-&gt;dev_addr, addr-&gt;sa_data, ETH_ALEN);
				// dev_ifsioc_locked()
				memcpy(ifr-&gt;ifr_hwaddr.sa_data,
					dev-&gt;dev_addr,...);

Close this race condition by guarding them with a RW semaphore,
like netdev_get_name(). We can not use seqlock here as it does not
allow blocking. The writers already take RTNL anyway, so this does
not affect the slow path. To avoid bothering existing
dev_set_mac_address() callers in drivers, introduce a new wrapper
just for user-facing callers on ioctl and rtnetlink paths.

Note, bonding also changes slave mac addresses but that requires
a separate patch due to the complexity of bonding code.

Fixes: 3710becf8a58 ("net: RCU locking for simple ioctl()")
Reported-by: "Gong, Sishuai" &lt;sishuai@purdue.edu&gt;
Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Cc: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Cong Wang &lt;cong.wang@bytedance.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>net: gro: do not keep too many GRO packets in napi-&gt;rx_list</title>
<updated>2021-02-06T03:28:01+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2021-02-04T21:31:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8dc1c444df193701910f5e80b5d4caaf705a8fb0'/>
<id>urn:sha1:8dc1c444df193701910f5e80b5d4caaf705a8fb0</id>
<content type='text'>
Commit c80794323e82 ("net: Fix packet reordering caused by GRO and
listified RX cooperation") had the unfortunate effect of adding
latencies in common workloads.

Before the patch, GRO packets were immediately passed to
upper stacks.

After the patch, we can accumulate quite a lot of GRO
packets (depdending on NAPI budget).

My fix is counting in napi-&gt;rx_count number of segments
instead of number of logical packets.

Fixes: c80794323e82 ("net: Fix packet reordering caused by GRO and listified RX cooperation")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Bisected-by: John Sperbeck &lt;jsperbeck@google.com&gt;
Tested-by: Jian Yang &lt;jianyang@google.com&gt;
Cc: Maxim Mikityanskiy &lt;maximmi@mellanox.com&gt;
Reviewed-by: Saeed Mahameed &lt;saeedm@nvidia.com&gt;
Reviewed-by: Edward Cree &lt;ecree.xilinx@gmail.com&gt;
Reviewed-by: Alexander Lobakin &lt;alobakin@pm.me&gt;
Link: https://lore.kernel.org/r/20210204213146.4192368-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: Disable NETIF_F_HW_TLS_RX when RXCSUM is disabled</title>
<updated>2021-01-19T23:58:05+00:00</updated>
<author>
<name>Tariq Toukan</name>
<email>tariqt@nvidia.com</email>
</author>
<published>2021-01-17T15:15:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a3eb4e9d4c9218476d05c52dfd2be3d6fdce6b91'/>
<id>urn:sha1:a3eb4e9d4c9218476d05c52dfd2be3d6fdce6b91</id>
<content type='text'>
With NETIF_F_HW_TLS_RX packets are decrypted in HW. This cannot be
logically done when RXCSUM offload is off.

Fixes: 14136564c8ee ("net: Add TLS RX offload feature")
Signed-off-by: Tariq Toukan &lt;tariqt@nvidia.com&gt;
Reviewed-by: Boris Pismenny &lt;borisp@nvidia.com&gt;
Link: https://lore.kernel.org/r/20210117151538.9411-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: Allow NETIF_F_HW_TLS_TX if IP_CSUM &amp;&amp; IPV6_CSUM</title>
<updated>2021-01-14T18:56:13+00:00</updated>
<author>
<name>Tariq Toukan</name>
<email>tariqt@nvidia.com</email>
</author>
<published>2021-01-14T15:12:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=25537d71e2d007faf42a244a75e5a2bb7c356234'/>
<id>urn:sha1:25537d71e2d007faf42a244a75e5a2bb7c356234</id>
<content type='text'>
Cited patch below blocked the TLS TX device offload unless HW_CSUM
is set. This broke devices that use IP_CSUM &amp;&amp; IP6_CSUM.
Here we fix it.

Note that the single HW_TLS_TX feature flag indicates support for
both IPv4/6, hence it should still be disabled in case only one of
(IP_CSUM | IPV6_CSUM) is set.

Fixes: ae0b04b238e2 ("net: Disable NETIF_F_HW_TLS_TX when HW_CSUM is disabled")
Signed-off-by: Tariq Toukan &lt;tariqt@nvidia.com&gt;
Reported-by: Rohit Maheshwari &lt;rohitm@chelsio.com&gt;
Reviewed-by: Maxim Mikityanskiy &lt;maximmi@mellanox.com&gt;
Link: https://lore.kernel.org/r/20210114151215.7061-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: make sure devices go through netdev_wait_all_refs</title>
<updated>2021-01-09T03:27:41+00:00</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2021-01-06T18:40:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=766b0515d5bec4b780750773ed3009b148df8c0a'/>
<id>urn:sha1:766b0515d5bec4b780750773ed3009b148df8c0a</id>
<content type='text'>
If register_netdevice() fails at the very last stage - the
notifier call - some subsystems may have already seen it and
grabbed a reference. struct net_device can't be freed right
away without calling netdev_wait_all_refs().

Now that we have a clean interface in form of dev-&gt;needs_free_netdev
and lenient free_netdev() we can undo what commit 93ee31f14f6f ("[NET]:
Fix free_netdev on register_netdev failure.") has done and complete
the unregistration path by bringing the net_set_todo() call back.

After registration fails user is still expected to explicitly
free the net_device, so make sure -&gt;needs_free_netdev is cleared,
otherwise rolling back the registration will cause the old double
free for callers who release rtnl_lock before the free.

This also solves the problem of priv_destructor not being called
on notifier error.

net_set_todo() will be moved back into unregister_netdevice_queue()
in a follow up.

Reported-by: Hulk Robot &lt;hulkci@huawei.com&gt;
Reported-by: Yang Yingliang &lt;yangyingliang@huawei.com&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: make free_netdev() more lenient with unregistering devices</title>
<updated>2021-01-09T03:27:41+00:00</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2021-01-06T18:40:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c269a24ce057abfc31130960e96ab197ef6ab196'/>
<id>urn:sha1:c269a24ce057abfc31130960e96ab197ef6ab196</id>
<content type='text'>
There are two flavors of handling netdev registration:
 - ones called without holding rtnl_lock: register_netdev() and
   unregister_netdev(); and
 - those called with rtnl_lock held: register_netdevice() and
   unregister_netdevice().

While the semantics of the former are pretty clear, the same can't
be said about the latter. The netdev_todo mechanism is utilized to
perform some of the device unregistering tasks and it hooks into
rtnl_unlock() so the locked variants can't actually finish the work.
In general free_netdev() does not mix well with locked calls. Most
drivers operating under rtnl_lock set dev-&gt;needs_free_netdev to true
and expect core to make the free_netdev() call some time later.

The part where this becomes most problematic is error paths. There is
no way to unwind the state cleanly after a call to register_netdevice(),
since unreg can't be performed fully without dropping locks.

Make free_netdev() more lenient, and defer the freeing if device
is being unregistered. This allows error paths to simply call
free_netdev() both after register_netdevice() failed, and after
a call to unregister_netdevice() but before dropping rtnl_lock.

Simplify the error paths which are currently doing gymnastics
around free_netdev() handling.

Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
</feed>
