<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/net/veth.c, branch v6.19.11</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.19.11</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.19.11'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-01-18T00:22:18+00:00</updated>
<entry>
<title>veth: fix data race in veth_get_ethtool_stats</title>
<updated>2026-01-18T00:22:18+00:00</updated>
<author>
<name>David Yang</name>
<email>mmyangfl@gmail.com</email>
</author>
<published>2026-01-14T12:24:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b47adaab8b3d443868096bac08fdbb3d403194ba'/>
<id>urn:sha1:b47adaab8b3d443868096bac08fdbb3d403194ba</id>
<content type='text'>
In veth_get_ethtool_stats(), some statistics protected by
u64_stats_sync, are read and accumulated in ignorance of possible
u64_stats_fetch_retry() events. These statistics, peer_tq_xdp_xmit and
peer_tq_xdp_xmit_err, are already accumulated by veth_xdp_xmit(). Fix
this by reading them into a temporary buffer first.

Fixes: 5fe6e56776ba ("veth: rely on peer veth_rq for ndo_xdp_xmit accounting")
Signed-off-by: David Yang &lt;mmyangfl@gmail.com&gt;
Link: https://patch.msgid.link/20260114122450.227982-1-mmyangfl@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net</title>
<updated>2025-11-27T20:19:08+00:00</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2025-11-27T20:16:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=db4029859d6fd03f0622d394f4cdb1be86d7ec62'/>
<id>urn:sha1:db4029859d6fd03f0622d394f4cdb1be86d7ec62</id>
<content type='text'>
Conflicts:

net/xdp/xsk.c
  0ebc27a4c67d ("xsk: avoid data corruption on cq descriptor number")
  8da7bea7db69 ("xsk: add indirect call for xsk_destruct_skb")
  30ed05adca4a ("xsk: use a smaller new lock for shared pool case")
https://lore.kernel.org/20251127105450.4a1665ec@canb.auug.org.au
https://lore.kernel.org/eb4eee14-7e24-4d1b-b312-e9ea738fefee@kernel.org

Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>veth: reduce XDP no_direct return section to fix race</title>
<updated>2025-11-21T02:44:32+00:00</updated>
<author>
<name>Jesper Dangaard Brouer</name>
<email>hawk@kernel.org</email>
</author>
<published>2025-11-19T16:28:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a14602fcae17a3f1cb8a8521bedf31728f9e7e39'/>
<id>urn:sha1:a14602fcae17a3f1cb8a8521bedf31728f9e7e39</id>
<content type='text'>
As explain in commit fa349e396e48 ("veth: Fix race with AF_XDP exposing
old or uninitialized descriptors") for veth there is a chance after
napi_complete_done() that another CPU can manage start another NAPI
instance running veth_pool(). For NAPI this is correctly handled as the
napi_schedule_prep() check will prevent multiple instances from getting
scheduled, but for the remaining code in veth_pool() this can run
concurrent with the newly started NAPI instance.

The problem/race is that xdp_clear_return_frame_no_direct() isn't
designed to be nested.

Prior to commit 401cb7dae813 ("net: Reference bpf_redirect_info via
task_struct on PREEMPT_RT.") the temporary BPF net context
bpf_redirect_info was stored per CPU, where this wasn't an issue. Since
this commit the BPF context is stored in 'current' task_struct. When
running veth in threaded-NAPI mode, then the kthread becomes the storage
area. Now a race exists between two concurrent veth_pool() function calls
one exiting NAPI and one running new NAPI, both using the same BPF net
context.

Race is when another CPU gets within the xdp_set_return_frame_no_direct()
section before exiting veth_pool() calls the clear-function
xdp_clear_return_frame_no_direct().

Fixes: 401cb7dae8130 ("net: Reference bpf_redirect_info via task_struct on PREEMPT_RT.")
Signed-off-by: Jesper Dangaard Brouer &lt;hawk@kernel.org&gt;
Link: https://patch.msgid.link/176356963888.337072.4805242001928705046.stgit@firesoul
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net</title>
<updated>2025-11-20T17:13:26+00:00</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2025-11-20T17:12:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9e203721ec6117b2f4436a26662413764fd669f0'/>
<id>urn:sha1:9e203721ec6117b2f4436a26662413764fd669f0</id>
<content type='text'>
Cross-merge networking fixes after downstream PR (net-6.18-rc7).

No conflicts, adjacent changes:

tools/testing/selftests/net/af_unix/Makefile
  e1bb28bf13f4 ("selftest: af_unix: Add test for SO_PEEK_OFF.")
  45a1cd8346ca ("selftests: af_unix: Add tests for ECONNRESET and EOF semantics")

Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>veth: more robust handing of race to avoid txq getting stuck</title>
<updated>2025-11-15T02:16:53+00:00</updated>
<author>
<name>Jesper Dangaard Brouer</name>
<email>hawk@kernel.org</email>
</author>
<published>2025-11-12T13:13:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5442a9da69789741bfda39f34ee7f69552bf0c56'/>
<id>urn:sha1:5442a9da69789741bfda39f34ee7f69552bf0c56</id>
<content type='text'>
Commit dc82a33297fc ("veth: apply qdisc backpressure on full ptr_ring to
reduce TX drops") introduced a race condition that can lead to a permanently
stalled TXQ. This was observed in production on ARM64 systems (Ampere Altra
Max).

The race occurs in veth_xmit(). The producer observes a full ptr_ring and
stops the queue (netif_tx_stop_queue()). The subsequent conditional logic,
intended to re-wake the queue if the consumer had just emptied it (if
(__ptr_ring_empty(...)) netif_tx_wake_queue()), can fail. This leads to a
"lost wakeup" where the TXQ remains stopped (QUEUE_STATE_DRV_XOFF) and
traffic halts.

This failure is caused by an incorrect use of the __ptr_ring_empty() API
from the producer side. As noted in kernel comments, this check is not
guaranteed to be correct if a consumer is operating on another CPU. The
empty test is based on ptr_ring-&gt;consumer_head, making it reliable only for
the consumer. Using this check from the producer side is fundamentally racy.

This patch fixes the race by adopting the more robust logic from an earlier
version V4 of the patchset, which always flushed the peer:

(1) In veth_xmit(), the racy conditional wake-up logic and its memory barrier
are removed. Instead, after stopping the queue, we unconditionally call
__veth_xdp_flush(rq). This guarantees that the NAPI consumer is scheduled,
making it solely responsible for re-waking the TXQ.
  This handles the race where veth_poll() consumes all packets and completes
NAPI *before* veth_xmit() on the producer side has called netif_tx_stop_queue.
The __veth_xdp_flush(rq) will observe rx_notify_masked is false and schedule
NAPI.

(2) On the consumer side, the logic for waking the peer TXQ is moved out of
veth_xdp_rcv() and placed at the end of the veth_poll() function. This
placement is part of fixing the race, as the netif_tx_queue_stopped() check
must occur after rx_notify_masked is potentially set to false during NAPI
completion.
  This handles the race where veth_poll() consumes all packets, but haven't
finished (rx_notify_masked is still true). The producer veth_xmit() stops the
TXQ and __veth_xdp_flush(rq) will observe rx_notify_masked is true, meaning
not starting NAPI.  Then veth_poll() change rx_notify_masked to false and
stops NAPI.  Before exiting veth_poll() will observe TXQ is stopped and wake
it up.

Fixes: dc82a33297fc ("veth: apply qdisc backpressure on full ptr_ring to reduce TX drops")
Reviewed-by: Toshiaki Makita &lt;toshiaki.makita1@gmail.com&gt;
Signed-off-by: Jesper Dangaard Brouer &lt;hawk@kernel.org&gt;
Link: https://patch.msgid.link/176295323282.307447.14790015927673763094.stgit@firesoul
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>veth: Fix a typo error in veth</title>
<updated>2025-11-05T01:00:50+00:00</updated>
<author>
<name>Chu Guangqing</name>
<email>chuguangqing@inspur.com</email>
</author>
<published>2025-11-03T05:53:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9781642e58903e52f3c607e957e5220e95e792b6'/>
<id>urn:sha1:9781642e58903e52f3c607e957e5220e95e792b6</id>
<content type='text'>
Fix a spellling error for resources

Signed-off-by: Chu Guangqing &lt;chuguangqing@inspur.com&gt;
Reviewed-by: Jacob Keller &lt;jacob.e.keller@intel.com&gt;
Link: https://patch.msgid.link/20251103055351.3150-1-chuguangqing@inspur.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>veth: prevent NULL pointer dereference in veth_xdp_rcv</title>
<updated>2025-06-12T15:08:32+00:00</updated>
<author>
<name>Jesper Dangaard Brouer</name>
<email>hawk@kernel.org</email>
</author>
<published>2025-06-11T12:40:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9337c54401a5bb6ac3c9f6c71dd2a9130cfba82e'/>
<id>urn:sha1:9337c54401a5bb6ac3c9f6c71dd2a9130cfba82e</id>
<content type='text'>
The veth peer device is RCU protected, but when the peer device gets
deleted (veth_dellink) then the pointer is assigned NULL (via
RCU_INIT_POINTER).

This patch adds a necessary NULL check in veth_xdp_rcv when accessing
the veth peer net_device.

This fixes a bug introduced in commit dc82a33297fc ("veth: apply qdisc
backpressure on full ptr_ring to reduce TX drops"). The bug is a race
and only triggers when having inflight packets on a veth that is being
deleted.

Reported-by: Ihor Solodrai &lt;ihor.solodrai@linux.dev&gt;
Closes: https://lore.kernel.org/all/fecfcad0-7a16-42b8-bff2-66ee83a6e5c4@linux.dev/
Reported-by: syzbot+c4c7bf27f6b0c4bd97fe@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/683da55e.a00a0220.d8eae.0052.GAE@google.com/
Fixes: dc82a33297fc ("veth: apply qdisc backpressure on full ptr_ring to reduce TX drops")
Signed-off-by: Jesper Dangaard Brouer &lt;hawk@kernel.org&gt;
Acked-by: Ihor Solodrai &lt;ihor.solodrai@linux.dev&gt;
Link: https://patch.msgid.link/174964557873.519608.10855046105237280978.stgit@firesoul
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>veth: apply qdisc backpressure on full ptr_ring to reduce TX drops</title>
<updated>2025-04-28T21:06:58+00:00</updated>
<author>
<name>Jesper Dangaard Brouer</name>
<email>hawk@kernel.org</email>
</author>
<published>2025-04-25T14:55:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=dc82a33297fc2c58cb0b2b008d728668d45c0f6a'/>
<id>urn:sha1:dc82a33297fc2c58cb0b2b008d728668d45c0f6a</id>
<content type='text'>
In production, we're seeing TX drops on veth devices when the ptr_ring
fills up. This can occur when NAPI mode is enabled, though it's
relatively rare. However, with threaded NAPI - which we use in
production - the drops become significantly more frequent.

The underlying issue is that with threaded NAPI, the consumer often runs
on a different CPU than the producer. This increases the likelihood of
the ring filling up before the consumer gets scheduled, especially under
load, leading to drops in veth_xmit() (ndo_start_xmit()).

This patch introduces backpressure by returning NETDEV_TX_BUSY when the
ring is full, signaling the qdisc layer to requeue the packet. The txq
(netdev queue) is stopped in this condition and restarted once
veth_poll() drains entries from the ring, ensuring coordination between
NAPI and qdisc.

Backpressure is only enabled when a qdisc is attached. Without a qdisc,
the driver retains its original behavior - dropping packets immediately
when the ring is full. This avoids unexpected behavior changes in setups
without a configured qdisc.

With a qdisc in place (e.g. fq, sfq) this allows Active Queue Management
(AQM) to fairly schedule packets across flows and reduce collateral
damage from elephant flows.

A known limitation of this approach is that the full ring sits in front
of the qdisc layer, effectively forming a FIFO buffer that introduces
base latency. While AQM still improves fairness and mitigates flow
dominance, the latency impact is measurable.

In hardware drivers, this issue is typically addressed using BQL (Byte
Queue Limits), which tracks in-flight bytes needed based on physical link
rate. However, for virtual drivers like veth, there is no fixed bandwidth
constraint - the bottleneck is CPU availability and the scheduler's ability
to run the NAPI thread. It is unclear how effective BQL would be in this
context.

This patch serves as a first step toward addressing TX drops. Future work
may explore adapting a BQL-like mechanism to better suit virtual devices
like veth.

Reported-by: Yan Zhai &lt;yan@cloudflare.com&gt;
Signed-off-by: Jesper Dangaard Brouer &lt;hawk@kernel.org&gt;
Reviewed-by: Toshiaki Makita &lt;toshiaki.makita1@gmail.com&gt;
Link: https://patch.msgid.link/174559294022.827981.1282809941662942189.stgit@firesoul
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: move misc netdev_lock flavors to a separate header</title>
<updated>2025-03-08T17:06:50+00:00</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2025-03-07T18:30:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8ef890df4031121a94407c84659125cbccd3fdbe'/>
<id>urn:sha1:8ef890df4031121a94407c84659125cbccd3fdbe</id>
<content type='text'>
Move the more esoteric helpers for netdev instance lock to
a dedicated header. This avoids growing netdevice.h to infinity
and makes rebuilding the kernel much faster (after touching
the header with the helpers).

The main netdev_lock() / netdev_unlock() functions are used
in static inlines in netdevice.h and will probably be used
most commonly, so keep them in netdevice.h.

Acked-by: Stanislav Fomichev &lt;sdf@fomichev.me&gt;
Link: https://patch.msgid.link/20250307183006.2312761-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>veth: use napi_skb_cache_get_bulk() instead of xdp_alloc_skb_bulk()</title>
<updated>2025-02-27T13:03:52+00:00</updated>
<author>
<name>Alexander Lobakin</name>
<email>aleksander.lobakin@intel.com</email>
</author>
<published>2025-02-25T17:17:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1c5bf4de975dd4d493cea3567703404819c81425'/>
<id>urn:sha1:1c5bf4de975dd4d493cea3567703404819c81425</id>
<content type='text'>
Now that we can bulk-allocate skbs from the NAPI cache, use that
function to do that in veth as well instead of direct allocation from
the kmem caches. veth uses NAPI and GRO, so this is both context-safe
and beneficial.

Reviewed-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Signed-off-by: Alexander Lobakin &lt;aleksander.lobakin@intel.com&gt;
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;

</content>
</entry>
</feed>
