<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/net/wireguard, branch v6.18.21</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.18.21</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.18.21'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2025-09-23T00:40:30+00:00</updated>
<entry>
<title>net: WQ_PERCPU added to alloc_workqueue users</title>
<updated>2025-09-23T00:40:30+00:00</updated>
<author>
<name>Marco Crivellari</name>
<email>marco.crivellari@suse.com</email>
</author>
<published>2025-09-18T14:24:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=27ce71e1ce81875df72f7698ba27988392bef602'/>
<id>urn:sha1:27ce71e1ce81875df72f7698ba27988392bef602</id>
<content type='text'>
Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistentcy cannot be addressed without refactoring the API.

alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.

This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.

This change adds a new WQ_PERCPU flag at the network subsystem, to explicitly
request the use of the per-CPU behavior. Both flags coexist for one release
cycle to allow callers to transition their calls.

Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.

With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU.

All existing users have been updated accordingly.

Suggested-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Marco Crivellari &lt;marco.crivellari@suse.com&gt;
Link: https://patch.msgid.link/20250918142427.309519-4-marco.crivellari@suse.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>wireguard: queueing: always return valid online CPU in wg_cpumask_choose_online()</title>
<updated>2025-09-12T01:52:21+00:00</updated>
<author>
<name>Yury Norov (NVIDIA)</name>
<email>yury.norov@gmail.com</email>
</author>
<published>2025-09-10T01:36:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5bd8de20770ca001621bb1aa5eb9a0977d0bd2d9'/>
<id>urn:sha1:5bd8de20770ca001621bb1aa5eb9a0977d0bd2d9</id>
<content type='text'>
The function gets number of online CPUS, and uses it to search for
Nth cpu in cpu_online_mask.

If id == num_online_cpus() - 1, and one CPU gets offlined between
calling num_online_cpus() -&gt; cpumask_nth(), there's a chance for
cpumask_nth() to find nothing and return &gt;= nr_cpu_ids.

The caller code in __queue_work() tries to avoid that by checking the
returned CPU against WORK_CPU_UNBOUND, which is NR_CPUS. It's not the
same as '&gt;= nr_cpu_ids'. On a typical Ubuntu desktop, NR_CPUS is 8192,
while nr_cpu_ids is the actual number of possible CPUs, say 8.

The non-existing cpu may later be passed to rcu_dereference() and
corrupt the logic. Fix it by switching from 'if' to 'while'.

Suggested-by: Jason A. Donenfeld &lt;Jason@zx2c4.com&gt;
Signed-off-by: Yury Norov (NVIDIA) &lt;yury.norov@gmail.com&gt;
Signed-off-by: Jason A. Donenfeld &lt;Jason@zx2c4.com&gt;
Link: https://patch.msgid.link/20250910013644.4153708-3-Jason@zx2c4.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>wireguard: queueing: simplify wg_cpumask_next_online()</title>
<updated>2025-09-12T01:52:20+00:00</updated>
<author>
<name>Yury Norov [NVIDIA]</name>
<email>yury.norov@gmail.com</email>
</author>
<published>2025-09-10T01:36:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5551d21284702a5f36791aecc7735870d0423996'/>
<id>urn:sha1:5551d21284702a5f36791aecc7735870d0423996</id>
<content type='text'>
wg_cpumask_choose_online() opencodes cpumask_nth(). Use it and make the
function significantly simpler. While there, fix opencoded cpu_online()
too.

Signed-off-by: Yury Norov &lt;yury.norov@gmail.com&gt;
Reviewed-by: Simon Horman &lt;horms@kernel.org&gt;
Signed-off-by: Jason A. Donenfeld &lt;Jason@zx2c4.com&gt;
Link: https://patch.msgid.link/20250910013644.4153708-2-Jason@zx2c4.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>wireguard: peer: Replace sockaddr with sockaddr_inet</title>
<updated>2025-07-25T22:29:58+00:00</updated>
<author>
<name>Kees Cook</name>
<email>kees@kernel.org</email>
</author>
<published>2025-07-22T17:18:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9203e0a82c0b0c4b12d96d6f6f9ef9afbb6ca9c6'/>
<id>urn:sha1:9203e0a82c0b0c4b12d96d6f6f9ef9afbb6ca9c6</id>
<content type='text'>
As part of the removal of the variably-sized sockaddr for kernel
internals, replace struct sockaddr with sockaddr_inet in the endpoint
union.

No binary changes; the union size remains unchanged due to sockaddr_inet
matching the size of sockaddr_in6.

Signed-off-by: Kees Cook &lt;kees@kernel.org&gt;
Link: https://patch.msgid.link/20250722171836.1078436-2-kees@kernel.org
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: Use netif_threaded_enable instead of netif_set_threaded in drivers</title>
<updated>2025-07-25T01:34:55+00:00</updated>
<author>
<name>Samiullah Khawaja</name>
<email>skhawaja@google.com</email>
</author>
<published>2025-07-23T01:30:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=78afdadafe6fe0c74c08fda156e7be0a0b402b90'/>
<id>urn:sha1:78afdadafe6fe0c74c08fda156e7be0a0b402b90</id>
<content type='text'>
Prepare for adding an enum type for NAPI threaded states by adding
netif_threaded_enable API. De-export the existing netif_set_threaded API
and only use it internally. Update existing drivers to use
netif_threaded_enable instead of the de-exported netif_set_threaded.

Note that dev_set_threaded used by mt76 debugfs file is unchanged.

Signed-off-by: Samiullah Khawaja &lt;skhawaja@google.com&gt;
Link: https://patch.msgid.link/20250723013031.2911384-3-skhawaja@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: s/dev_set_threaded/netif_set_threaded/</title>
<updated>2025-07-19T00:27:47+00:00</updated>
<author>
<name>Stanislav Fomichev</name>
<email>sdf@fomichev.me</email>
</author>
<published>2025-07-17T17:23:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5d4d84618e1aa2c9531afa3a6323f56e1db4dcf7'/>
<id>urn:sha1:5d4d84618e1aa2c9531afa3a6323f56e1db4dcf7</id>
<content type='text'>
Commit cc34acd577f1 ("docs: net: document new locking reality")
introduced netif_ vs dev_ function semantics: the former expects locked
netdev, the latter takes care of the locking. We don't strictly
follow this semantics on either side, but there are more dev_xxx handlers
now that don't fit. Rename them to netif_xxx where appropriate.

Note that one dev_set_threaded call still remains in mt76 for debugfs file.

Signed-off-by: Stanislav Fomichev &lt;sdf@fomichev.me&gt;
Link: https://patch.msgid.link/20250717172333.1288349-7-sdf@fomichev.me
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: ipv6: Add a flags argument to ip6tunnel_xmit(), udp_tunnel6_xmit_skb()</title>
<updated>2025-06-18T01:18:45+00:00</updated>
<author>
<name>Petr Machata</name>
<email>petrm@nvidia.com</email>
</author>
<published>2025-06-16T22:44:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f78c75d84fe83898f0a00658f593d4f17b38cbc6'/>
<id>urn:sha1:f78c75d84fe83898f0a00658f593d4f17b38cbc6</id>
<content type='text'>
ip6tunnel_xmit() erases the contents of the SKB control block. In order to
be able to set particular IP6CB flags on the SKB, add a corresponding
parameter, and propagate it to udp_tunnel6_xmit_skb() as well.

In one of the following patches, VXLAN driver will use this facility to
mark packets as subject to IPv6 multicast routing.

Signed-off-by: Petr Machata &lt;petrm@nvidia.com&gt;
Reviewed-by: Ido Schimmel &lt;idosch@nvidia.com&gt;
Reviewed-by: Nikolay Aleksandrov &lt;razor@blackwall.org&gt;
Link: https://patch.msgid.link/acb4f9f3e40c3a931236c3af08a720b017fbfbfb.1750113335.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: ipv4: Add a flags argument to iptunnel_xmit(), udp_tunnel_xmit_skb()</title>
<updated>2025-06-18T01:18:44+00:00</updated>
<author>
<name>Petr Machata</name>
<email>petrm@nvidia.com</email>
</author>
<published>2025-06-16T22:44:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e3411e326fa48c9be09ba449330352ba698db698'/>
<id>urn:sha1:e3411e326fa48c9be09ba449330352ba698db698</id>
<content type='text'>
iptunnel_xmit() erases the contents of the SKB control block. In order to
be able to set particular IPCB flags on the SKB, add a corresponding
parameter, and propagate it to udp_tunnel_xmit_skb() as well.

In one of the following patches, VXLAN driver will use this facility to
mark packets as subject to IP multicast routing.

Signed-off-by: Petr Machata &lt;petrm@nvidia.com&gt;
Reviewed-by: Ido Schimmel &lt;idosch@nvidia.com&gt;
Reviewed-by: Nikolay Aleksandrov &lt;razor@blackwall.org&gt;
Acked-by: Antonio Quartulli &lt;antonio@openvpn.net&gt;
Link: https://patch.msgid.link/89c9daf9f2dc088b6b92ccebcc929f51742de91f.1750113335.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>treewide, timers: Rename from_timer() to timer_container_of()</title>
<updated>2025-06-08T07:07:37+00:00</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@kernel.org</email>
</author>
<published>2025-05-09T05:51:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=41cb08555c4164996d67c78b3bf1c658075b75f1'/>
<id>urn:sha1:41cb08555c4164996d67c78b3bf1c658075b75f1</id>
<content type='text'>
Move this API to the canonical timer_*() namespace.

[ tglx: Redone against pre rc1 ]

Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: https://lore.kernel.org/all/aB2X0jCKQO56WdMt@gmail.com

</content>
</entry>
<entry>
<title>wireguard: device: enable threaded NAPI</title>
<updated>2025-06-05T14:53:57+00:00</updated>
<author>
<name>Mirco Barone</name>
<email>mirco.barone@polito.it</email>
</author>
<published>2025-06-05T12:06:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=db9ae3b6b43c79b1ba87eea849fd65efa05b4b2e'/>
<id>urn:sha1:db9ae3b6b43c79b1ba87eea849fd65efa05b4b2e</id>
<content type='text'>
Enable threaded NAPI by default for WireGuard devices in response to low
performance behavior that we observed when multiple tunnels (and thus
multiple wg devices) are deployed on a single host.  This affects any
kind of multi-tunnel deployment, regardless of whether the tunnels share
the same endpoints or not (i.e., a VPN concentrator type of gateway
would also be affected).

The problem is caused by the fact that, in case of a traffic surge that
involves multiple tunnels at the same time, the polling of the NAPI
instance of all these wg devices tends to converge onto the same core,
causing underutilization of the CPU and bottlenecking performance.

This happens because NAPI polling is hosted by default in softirq
context, but the WireGuard driver only raises this softirq after the rx
peer queue has been drained, which doesn't happen during high traffic.
In this case, the softirq already active on a core is reused instead of
raising a new one.

As a result, once two or more tunnel softirqs have been scheduled on
the same core, they remain pinned there until the surge ends.

In our experiments, this almost always leads to all tunnel NAPIs being
handled on a single core shortly after a surge begins, limiting
scalability to less than 3× the performance of a single tunnel, despite
plenty of unused CPU cores being available.

The proposed mitigation is to enable threaded NAPI for all WireGuard
devices. This moves the NAPI polling context to a dedicated per-device
kernel thread, allowing the scheduler to balance the load across all
available cores.

On our 32-core gateways, enabling threaded NAPI yields a ~4× performance
improvement with 16 tunnels, increasing throughput from ~13 Gbps to
~48 Gbps. Meanwhile, CPU usage on the receiver (which is the bottleneck)
jumps from 20% to 100%.

We have found no performance regressions in any scenario we tested.
Single-tunnel throughput remains unchanged.

More details are available in our Netdev paper.

Link: https://netdevconf.info/0x18/docs/netdev-0x18-paper23-talk-paper.pdf
Signed-off-by: Mirco Barone &lt;mirco.barone@polito.it&gt;
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld &lt;Jason@zx2c4.com&gt;
Link: https://patch.msgid.link/20250605120616.2808744-1-Jason@zx2c4.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
</feed>
