<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/net/sched/sch_generic.c, branch linux-7.1.y</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=linux-7.1.y</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=linux-7.1.y'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-03-19T21:16:00+00:00</updated>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net</title>
<updated>2026-03-19T21:16:00+00:00</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2026-03-12T19:53:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=edab1ca5ec6fffecbf340e26956ce73e502901d5'/>
<id>urn:sha1:edab1ca5ec6fffecbf340e26956ce73e502901d5</id>
<content type='text'>
Cross-merge networking fixes after downstream PR (net-7.0-rc5).

net/netfilter/nft_set_rbtree.c
  598adea720b97 ("netfilter: revert nft_set_rbtree: validate open interval overlap")
  3aea466a43998 ("netfilter: nft_set_rbtree: don't disable bh when acquiring tree lock")
https://lore.kernel.org/abgaQBpeGstdN4oq@sirena.org.uk

No adjacent changes.

Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net/sched: teql: Fix double-free in teql_master_xmit</title>
<updated>2026-03-17T02:40:32+00:00</updated>
<author>
<name>Jamal Hadi Salim</name>
<email>jhs@mojatatu.com</email>
</author>
<published>2026-03-15T15:54:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=66360460cab63c248ca5b1070a01c0c29133b960'/>
<id>urn:sha1:66360460cab63c248ca5b1070a01c0c29133b960</id>
<content type='text'>
Whenever a TEQL devices has a lockless Qdisc as root, qdisc_reset should
be called using the seq_lock to avoid racing with the datapath. Failure
to do so may cause crashes like the following:

[  238.028993][  T318] BUG: KASAN: double-free in skb_release_data (net/core/skbuff.c:1139)
[  238.029328][  T318] Free of addr ffff88810c67ec00 by task poc_teql_uaf_ke/318
[  238.029749][  T318]
[  238.029900][  T318] CPU: 3 UID: 0 PID: 318 Comm: poc_teql_ke Not tainted 7.0.0-rc3-00149-ge5b31d988a41 #704 PREEMPT(full)
[  238.029906][  T318] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  238.029910][  T318] Call Trace:
[  238.029913][  T318]  &lt;TASK&gt;
[  238.029916][  T318]  dump_stack_lvl (lib/dump_stack.c:122)
[  238.029928][  T318]  print_report (mm/kasan/report.c:379 mm/kasan/report.c:482)
[  238.029940][  T318]  ? skb_release_data (net/core/skbuff.c:1139)
[  238.029944][  T318]  ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221)
...
[  238.029957][  T318]  ? skb_release_data (net/core/skbuff.c:1139)
[  238.029969][  T318]  kasan_report_invalid_free (mm/kasan/report.c:221 mm/kasan/report.c:563)
[  238.029979][  T318]  ? skb_release_data (net/core/skbuff.c:1139)
[  238.029989][  T318]  check_slab_allocation (mm/kasan/common.c:231)
[  238.029995][  T318]  kmem_cache_free (mm/slub.c:2637 (discriminator 1) mm/slub.c:6168 (discriminator 1) mm/slub.c:6298 (discriminator 1))
[  238.030004][  T318]  skb_release_data (net/core/skbuff.c:1139)
...
[  238.030025][  T318]  sk_skb_reason_drop (net/core/skbuff.c:1256)
[  238.030032][  T318]  pfifo_fast_reset (./include/linux/ptr_ring.h:171 ./include/linux/ptr_ring.h:309 ./include/linux/skb_array.h:98 net/sched/sch_generic.c:827)
[  238.030039][  T318]  ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221)
...
[  238.030054][  T318]  qdisc_reset (net/sched/sch_generic.c:1034)
[  238.030062][  T318]  teql_destroy (./include/linux/spinlock.h:395 net/sched/sch_teql.c:157)
[  238.030071][  T318]  __qdisc_destroy (./include/net/pkt_sched.h:328 net/sched/sch_generic.c:1077)
[  238.030077][  T318]  qdisc_graft (net/sched/sch_api.c:1062 net/sched/sch_api.c:1053 net/sched/sch_api.c:1159)
[  238.030089][  T318]  ? __pfx_qdisc_graft (net/sched/sch_api.c:1091)
[  238.030095][  T318]  ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221)
[  238.030102][  T318]  ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221)
[  238.030106][  T318]  ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221)
[  238.030114][  T318]  tc_get_qdisc (net/sched/sch_api.c:1529 net/sched/sch_api.c:1556)
...
[  238.072958][  T318] Allocated by task 303 on cpu 5 at 238.026275s:
[  238.073392][  T318]  kasan_save_stack (mm/kasan/common.c:58)
[  238.073884][  T318]  kasan_save_track (mm/kasan/common.c:64 (discriminator 5) mm/kasan/common.c:79 (discriminator 5))
[  238.074230][  T318]  __kasan_slab_alloc (mm/kasan/common.c:369)
[  238.074578][  T318]  kmem_cache_alloc_node_noprof (./include/linux/kasan.h:253 mm/slub.c:4542 mm/slub.c:4869 mm/slub.c:4921)
[  238.076091][  T318]  kmalloc_reserve (net/core/skbuff.c:616 (discriminator 107))
[  238.076450][  T318]  __alloc_skb (net/core/skbuff.c:713)
[  238.076834][  T318]  alloc_skb_with_frags (./include/linux/skbuff.h:1383 net/core/skbuff.c:6763)
[  238.077178][  T318]  sock_alloc_send_pskb (net/core/sock.c:2997)
[  238.077520][  T318]  packet_sendmsg (net/packet/af_packet.c:2926 net/packet/af_packet.c:3019 net/packet/af_packet.c:3108)
[  238.081469][  T318]
[  238.081870][  T318] Freed by task 299 on cpu 1 at 238.028496s:
[  238.082761][  T318]  kasan_save_stack (mm/kasan/common.c:58)
[  238.083481][  T318]  kasan_save_track (mm/kasan/common.c:64 (discriminator 5) mm/kasan/common.c:79 (discriminator 5))
[  238.085348][  T318]  kasan_save_free_info (mm/kasan/generic.c:587 (discriminator 1))
[  238.085900][  T318]  __kasan_slab_free (mm/kasan/common.c:287)
[  238.086439][  T318]  kmem_cache_free (mm/slub.c:6168 (discriminator 3) mm/slub.c:6298 (discriminator 3))
[  238.087007][  T318]  skb_release_data (net/core/skbuff.c:1139)
[  238.087491][  T318]  consume_skb (net/core/skbuff.c:1451)
[  238.087757][  T318]  teql_master_xmit (net/sched/sch_teql.c:358)
[  238.088116][  T318]  dev_hard_start_xmit (./include/linux/netdevice.h:5324 ./include/linux/netdevice.h:5333 net/core/dev.c:3871 net/core/dev.c:3887)
[  238.088468][  T318]  sch_direct_xmit (net/sched/sch_generic.c:347)
[  238.088820][  T318]  __qdisc_run (net/sched/sch_generic.c:420 (discriminator 1))
[  238.089166][  T318]  __dev_queue_xmit (./include/net/sch_generic.h:229 ./include/net/pkt_sched.h:121 ./include/net/pkt_sched.h:117 net/core/dev.c:4196 net/core/dev.c:4802)

Workflow to reproduce:
1. Initialize a TEQL topology (dummy0 and ifb0 as slaves, teql0 up).
2. Start multiple sender workers continuously transmitting packets
   through teql0 to drive teql_master_xmit().
3. In parallel, repeatedly delete and re-add the root qdisc on
   dummy0 and ifb0 via RTNETLINK, forcing frequent teardown and reset activity
   (teql_destroy() / qdisc_reset()).
4. After running both workloads concurrently for several iterations,
   KASAN reports slab-use-after-free or double-free in the skb free path.

Fix this by moving dev_reset_queue to sch_generic.h and calling it, instead
of qdisc_reset, in teql_destroy since it handles both the lock and lockless
cases correctly for root qdiscs.

Fixes: 96009c7d500e ("sched: replace __QDISC_STATE_RUNNING bit with a spin lock")
Reported-by: Xianrui Dong &lt;keenanat2000@gmail.com&gt;
Tested-by: Xianrui Dong &lt;keenanat2000@gmail.com&gt;
Co-developed-by: Victor Nogueira &lt;victor@mojatatu.com&gt;
Signed-off-by: Victor Nogueira &lt;victor@mojatatu.com&gt;
Signed-off-by: Jamal Hadi Salim &lt;jhs@mojatatu.com&gt;
Link: https://patch.msgid.link/20260315155422.147256-1-jhs@mojatatu.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net/sched: use rtnl_kfree_skbs() in pfifo_fast_reset()</title>
<updated>2026-03-10T02:01:53+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2026-03-06T13:31:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=58e4d35ae7b9325ab622bbcc34312b17af425c8f'/>
<id>urn:sha1:58e4d35ae7b9325ab622bbcc34312b17af425c8f</id>
<content type='text'>
rtnl_kfree_skbs() reduces RTNL and qdisc spinlock hold time.

skbs are freed later after RTNL has been released.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Link: https://patch.msgid.link/20260306133154.678730-1-edumazet@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net/sched: do not reset queues in graft operations</title>
<updated>2026-03-10T01:55:55+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2026-03-07T16:34:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=47e8dbb6e763e5ccfed2ab4aa55cbb163382aec1'/>
<id>urn:sha1:47e8dbb6e763e5ccfed2ab4aa55cbb163382aec1</id>
<content type='text'>
Following typical script is extremely disruptive,
because each graft operation calls dev_deactivate()
which resets all the queues of the device.

QPARAM="limit 100000 flow_limit 1000 buckets 4096"
TXQS=64
for ETH in eth1
do
 tc qd del dev $ETH root 2&gt;/dev/null
 tc qd add dev $ETH root handle 1: mq
 for i in `seq 1 $TXQS`
 do
   slot=$( printf %x $(( i )) )
   tc qd add dev $ETH parent 1:$slot fq $QPARAM
 done
done

One can add "ip link set dev $ETH down/up" to reduce the disruption time:

QPARAM="limit 100000 flow_limit 1000 buckets 4096"
TXQS=64
for ETH in eth1
do
 ip link set dev $ETH down
 tc qd del dev $ETH root 2&gt;/dev/null
 tc qd add dev $ETH root handle 1: mq
 for i in `seq 1 $TXQS`
 do
   slot=$( printf %x $(( i )) )
   tc qd add dev $ETH parent 1:$slot fq $QPARAM
 done
 ip link set dev $ETH up
done

Or we can add a @reset_needed flag to dev_deactivate() and
dev_deactivate_many().

This flag is set to true at device dismantle or linkwatch_do_dev(),
and to false for graft operations.

In the future, we might only stop one queue instead of the whole
device, ie call dev_deactivate_queue() instead of dev_deactivate().

I think the problem (quadratic behavior) was added in commit
2fb541c862c9 ("net: sch_generic: aviod concurrent reset and enqueue op
for lockless qdisc") but this does not look serious enough to deserve
risky backports.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Yunsheng Lin &lt;linyunsheng@huawei.com&gt;
Reviewed-by: Jamal Hadi Salim &lt;jhs@mojatatu.com&gt;
Reviewed-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Reviewed-by: Victor Nogueira &lt;victor@mojatatu.com&gt;
Link: https://patch.msgid.link/20260307163430.470644-1-edumazet@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: sched: introduce qdisc-specific drop reason tracing</title>
<updated>2026-02-28T23:31:34+00:00</updated>
<author>
<name>Jesper Dangaard Brouer</name>
<email>hawk@kernel.org</email>
</author>
<published>2026-02-26T13:44:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ff2998f29f390d963299103f0b247cc79106ced5'/>
<id>urn:sha1:ff2998f29f390d963299103f0b247cc79106ced5</id>
<content type='text'>
Create new enum qdisc_drop_reason and trace_qdisc_drop tracepoint
for qdisc layer drop diagnostics with direct qdisc context visibility.

The new tracepoint includes qdisc handle, parent, kind (name), and
device information. Existing SKB_DROP_REASON_QDISC_DROP is retained
for backwards compatibility via kfree_skb_reason().

Convert qdiscs with drop reasons to use the new infrastructure.

Change CAKE's cobalt_should_drop() return type from enum skb_drop_reason
to enum qdisc_drop_reason to fix implicit enum conversion warnings.
Use QDISC_DROP_UNSPEC as the 'not dropped' sentinel instead of
SKB_NOT_DROPPED_YET. Both have the same compiled value (0), so the
comparison logic remains semantically equivalent.

Signed-off-by: Jesper Dangaard Brouer &lt;hawk@kernel.org&gt;
Reviewed-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Link: https://patch.msgid.link/177211345275.3011628.1974310302645218067.stgit@firesoul
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net/sched: don't use dynamic lockdep keys with clsact/ingress/noqueue</title>
<updated>2026-02-05T17:32:45+00:00</updated>
<author>
<name>Davide Caratti</name>
<email>dcaratti@redhat.com</email>
</author>
<published>2026-02-04T16:02:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a90f6dcefca6d5ad765435b3188a3a440ed193a1'/>
<id>urn:sha1:a90f6dcefca6d5ad765435b3188a3a440ed193a1</id>
<content type='text'>
Currently we are registering one dynamic lockdep key for each allocated
qdisc, to avoid false deadlock reports when mirred (or TC eBPF) redirects
packets to another device while the root lock is acquired [1].
Since dynamic keys are a limited resource, we can save them at least for
qdiscs that are not meant to acquire the root lock in the traffic path,
or to carry traffic at all, like:

 - clsact
 - ingress
 - noqueue

Don't register dynamic keys for the above schedulers, so that we hit
MAX_LOCKDEP_KEYS later in our tests.

[1] https://github.com/multipath-tcp/mptcp_net-next/issues/451

Changes in v2:
 - change ordering of spin_lock_init() vs. lockdep_register_key()
   (Jakub Kicinski)

Signed-off-by: Davide Caratti &lt;dcaratti@redhat.com&gt;
Link: https://patch.msgid.link/94448f7fa7c4f52d2ce416a4895ec87d456d7417.1770220576.git.dcaratti@redhat.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net</title>
<updated>2025-11-13T20:35:38+00:00</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2025-11-13T20:32:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c99ebb6132595b4b288a413981197eb076547c5a'/>
<id>urn:sha1:c99ebb6132595b4b288a413981197eb076547c5a</id>
<content type='text'>
Cross-merge networking fixes after downstream PR (net-6.18-rc6).

No conflicts, adjacent changes in:

drivers/net/phy/micrel.c
  96a9178a29a6 ("net: phy: micrel: lan8814 fix reset of the QSGMII interface")
  61b7ade9ba8c ("net: phy: micrel: Add support for non PTP SKUs for lan8814")

and a trivial one in tools/testing/selftests/drivers/net/Makefile.

Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net_sched: limit try_bulk_dequeue_skb() batches</title>
<updated>2025-11-12T01:56:50+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2025-11-09T16:12:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0345552a653ce5542affeb69ac5aa52177a5199b'/>
<id>urn:sha1:0345552a653ce5542affeb69ac5aa52177a5199b</id>
<content type='text'>
After commit 100dfa74cad9 ("inet: dev_queue_xmit() llist adoption")
I started seeing many qdisc requeues on IDPF under high TX workload.

$ tc -s qd sh dev eth1 handle 1: ; sleep 1; tc -s qd sh dev eth1 handle 1:
qdisc mq 1: root
 Sent 43534617319319 bytes 268186451819 pkt (dropped 0, overlimits 0 requeues 3532840114)
 backlog 1056Kb 6675p requeues 3532840114
qdisc mq 1: root
 Sent 43554665866695 bytes 268309964788 pkt (dropped 0, overlimits 0 requeues 3537737653)
 backlog 781164b 4822p requeues 3537737653

This is caused by try_bulk_dequeue_skb() being only limited by BQL budget.

perf record -C120-239 -e qdisc:qdisc_dequeue sleep 1 ; perf script
...
 netperf 75332 [146]  2711.138269: qdisc:qdisc_dequeue: dequeue ifindex=5 qdisc handle=0x80150000 parent=0x10013 txq_state=0x0 packets=1292 skbaddr=0xff378005a1e9f200
 netperf 75332 [146]  2711.138953: qdisc:qdisc_dequeue: dequeue ifindex=5 qdisc handle=0x80150000 parent=0x10013 txq_state=0x0 packets=1213 skbaddr=0xff378004d607a500
 netperf 75330 [144]  2711.139631: qdisc:qdisc_dequeue: dequeue ifindex=5 qdisc handle=0x80150000 parent=0x10013 txq_state=0x0 packets=1233 skbaddr=0xff3780046be20100
 netperf 75333 [147]  2711.140356: qdisc:qdisc_dequeue: dequeue ifindex=5 qdisc handle=0x80150000 parent=0x10013 txq_state=0x0 packets=1093 skbaddr=0xff37800514845b00
 netperf 75337 [151]  2711.141037: qdisc:qdisc_dequeue: dequeue ifindex=5 qdisc handle=0x80150000 parent=0x10013 txq_state=0x0 packets=1353 skbaddr=0xff37800460753300
 netperf 75337 [151]  2711.141877: qdisc:qdisc_dequeue: dequeue ifindex=5 qdisc handle=0x80150000 parent=0x10013 txq_state=0x0 packets=1367 skbaddr=0xff378004e72c7b00
 netperf 75330 [144]  2711.142643: qdisc:qdisc_dequeue: dequeue ifindex=5 qdisc handle=0x80150000 parent=0x10013 txq_state=0x0 packets=1202 skbaddr=0xff3780045bd60000
...

This is bad because :

1) Large batches hold one victim cpu for a very long time.

2) Driver often hit their own TX ring limit (all slots are used).

3) We call dev_requeue_skb()

4) Requeues are using a FIFO (q-&gt;gso_skb), breaking qdisc ability to
   implement FQ or priority scheduling.

5) dequeue_skb() gets packets from q-&gt;gso_skb one skb at a time
   with no xmit_more support. This is causing many spinlock games
   between the qdisc and the device driver.

Requeues were supposed to be very rare, lets keep them this way.

Limit batch sizes to /proc/sys/net/core/dev_weight (default 64) as
__qdisc_run() was designed to use.

Fixes: 5772e9a3463b ("qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Acked-by: Jesper Dangaard Brouer &lt;hawk@kernel.org&gt;
Link: https://patch.msgid.link/20251109161215.2574081-1-edumazet@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: dev_queue_xmit() llist adoption</title>
<updated>2025-10-16T23:25:10+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2025-10-14T17:19:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=100dfa74cad9d4665cdcf0cc8e673b123a3ea910'/>
<id>urn:sha1:100dfa74cad9d4665cdcf0cc8e673b123a3ea910</id>
<content type='text'>
Remove busylock spinlock and use a lockless list (llist)
to reduce spinlock contention to the minimum.

Idea is that only one cpu might spin on the qdisc spinlock,
while others simply add their skb in the llist.

After this patch, we get a 300 % improvement on heavy TX workloads.
- Sending twice the number of packets per second.
- While consuming 50 % less cycles.

Note that this also allows in the future to submit batches
to various qdisc-&gt;enqueue() methods.

Tested:

- Dual Intel(R) Xeon(R) 6985P-C  (480 hyper threads).
- 100Gbit NIC, 30 TX queues with FQ packet scheduler.
- echo 64 &gt;/sys/kernel/slab/skbuff_small_head/cpu_partial (avoid contention in mm)
- 240 concurrent "netperf -t UDP_STREAM -- -m 120 -n"

Before:

16 Mpps (41 Mpps if each thread is pinned to a different cpu)

vmstat 2 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
243  0      0 2368988672  51036 1100852    0    0   146     1  242   60  0  9 91  0  0
244  0      0 2368988672  51036 1100852    0    0   536    10 487745 14718  0 52 48  0  0
244  0      0 2368988672  51036 1100852    0    0   512     0 503067 46033  0 52 48  0  0
244  0      0 2368988672  51036 1100852    0    0   512     0 494807 12107  0 52 48  0  0
244  0      0 2368988672  51036 1100852    0    0   702    26 492845 10110  0 52 48  0  0

Lock contention (1 second sample taken on 8 cores)
perf lock record -C0-7 sleep 1; perf lock contention
 contended   total wait     max wait     avg wait         type   caller

    442111      6.79 s     162.47 ms     15.35 us     spinlock   dev_hard_start_xmit+0xcd
      5961      9.57 ms      8.12 us      1.60 us     spinlock   __dev_queue_xmit+0x3a0
       244    560.63 us      7.63 us      2.30 us     spinlock   do_softirq+0x5b
        13     25.09 us      3.21 us      1.93 us     spinlock   net_tx_action+0xf8

If netperf threads are pinned, spinlock stress is very high.
perf lock record -C0-7 sleep 1; perf lock contention
 contended   total wait     max wait     avg wait         type   caller

    964508      7.10 s     147.25 ms      7.36 us     spinlock   dev_hard_start_xmit+0xcd
       201    268.05 us      4.65 us      1.33 us     spinlock   __dev_queue_xmit+0x3a0
        12     26.05 us      3.84 us      2.17 us     spinlock   do_softirq+0x5b

@__dev_queue_xmit_ns:
[256, 512)            21 |                                                    |
[512, 1K)            631 |                                                    |
[1K, 2K)           27328 |@                                                   |
[2K, 4K)          265392 |@@@@@@@@@@@@@@@@                                    |
[4K, 8K)          417543 |@@@@@@@@@@@@@@@@@@@@@@@@@@                          |
[8K, 16K)         826292 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[16K, 32K)        733822 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@      |
[32K, 64K)         19055 |@                                                   |
[64K, 128K)        17240 |@                                                   |
[128K, 256K)       25633 |@                                                   |
[256K, 512K)           4 |                                                    |

After:

29 Mpps (57 Mpps if each thread is pinned to a different cpu)

vmstat 2 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
78  0      0 2369573632  32896 1350988    0    0    22     0  331  254  0  8 92  0  0
75  0      0 2369573632  32896 1350988    0    0    22    50 425713 280199  0 23 76  0  0
104  0      0 2369573632  32896 1350988    0    0   290     0 430238 298247  0 23 76  0  0
86  0      0 2369573632  32896 1350988    0    0   132     0 428019 291865  0 24 76  0  0
90  0      0 2369573632  32896 1350988    0    0   502     0 422498 278672  0 23 76  0  0

perf lock record -C0-7 sleep 1; perf lock contention
 contended   total wait     max wait     avg wait         type   caller

      2524    116.15 ms    486.61 us     46.02 us     spinlock   __dev_queue_xmit+0x55b
      5821    107.18 ms    371.67 us     18.41 us     spinlock   dev_hard_start_xmit+0xcd
      2377      9.73 ms     35.86 us      4.09 us     spinlock   ___slab_alloc+0x4e0
       923      5.74 ms     20.91 us      6.22 us     spinlock   ___slab_alloc+0x5c9
       121      3.42 ms    193.05 us     28.24 us     spinlock   net_tx_action+0xf8
         6    564.33 us    167.60 us     94.05 us     spinlock   do_softirq+0x5b

If netperf threads are pinned (~54 Mpps)
perf lock record -C0-7 sleep 1; perf lock contention
     32907    316.98 ms    195.98 us      9.63 us     spinlock   dev_hard_start_xmit+0xcd
      4507     61.83 ms    212.73 us     13.72 us     spinlock   __dev_queue_xmit+0x554
      2781     23.53 ms     40.03 us      8.46 us     spinlock   ___slab_alloc+0x5c9
      3554     18.94 ms     34.69 us      5.33 us     spinlock   ___slab_alloc+0x4e0
       233      9.09 ms    215.70 us     38.99 us     spinlock   do_softirq+0x5b
       153    930.66 us     48.67 us      6.08 us     spinlock   net_tx_action+0xfd
        84    331.10 us     14.22 us      3.94 us     spinlock   ___slab_alloc+0x5c9
       140    323.71 us      9.94 us      2.31 us     spinlock   ___slab_alloc+0x4e0

@__dev_queue_xmit_ns:
[128, 256)       1539830 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                  |
[256, 512)       2299558 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[512, 1K)         483936 |@@@@@@@@@@                                          |
[1K, 2K)          265345 |@@@@@@                                              |
[2K, 4K)          145463 |@@@                                                 |
[4K, 8K)           54571 |@                                                   |
[8K, 16K)          10270 |                                                    |
[16K, 32K)          9385 |                                                    |
[32K, 64K)          7749 |                                                    |
[64K, 128K)        26799 |                                                    |
[128K, 256K)        2665 |                                                    |
[256K, 512K)         665 |                                                    |

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Reviewed-by: Kuniyuki Iwashima &lt;kuniyu@google.com&gt;
Tested-by: Jamal Hadi Salim &lt;jhs@mojatatu.com&gt;
Acked-by: Jamal Hadi Salim &lt;jhs@mojatatu.com&gt;
Link: https://patch.msgid.link/20251014171907.3554413-7-edumazet@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>Revert "net/sched: Fix mirred deadlock on device recursion"</title>
<updated>2025-10-16T23:25:10+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2025-10-14T17:19:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=178ca30889a13b555dddab7689fd2cc58c8e5dac'/>
<id>urn:sha1:178ca30889a13b555dddab7689fd2cc58c8e5dac</id>
<content type='text'>
This reverts commits 0f022d32c3eca477fbf79a205243a6123ed0fe11
and 44180feaccf266d9b0b28cc4ceaac019817deb5c.

Prior patch in this series implemented loop detection
in act_mirred, we can remove q-&gt;owner to save some cycles
in the fast path.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: Kuniyuki Iwashima &lt;kuniyu@google.com&gt;
Reviewed-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Reviewed-by: Victor Nogueira &lt;victor@mojatatu.com&gt;
Tested-by: Jamal Hadi Salim &lt;jhs@mojatatu.com&gt;
Acked-by: Jamal Hadi Salim &lt;jhs@mojatatu.com&gt;
Link: https://patch.msgid.link/20251014171907.3554413-5-edumazet@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
</feed>
